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Sir: 

I, William T. McAllister, a citizen of the United States of America, hereby declare and 

state: 

1 . My qualifications are set forth in the attached curriculum vitae. 

2. I am an inventor of the above-identified patent application. 

3. I am familiar with its contents and the contents of the pending Office Action 
and the accompanying Amendment After Final Rejection under 37 C.F.R. 1.116. 

4. I have read and understand the attached references, and believe that the 
teachings of the attached references represent the state of the art at the time the application 
was filed. I believe that the specification as originally filed fully enabled one skilled in the art 
to make and use the invention as recited in the claims as amended by the accompanying 
Amendment After Final Rejection for the reasons set forth below. 

5. One skilled in the art would have understood that RNA polymerases encoded 
by bacteriophage T7 and its relatives have many common structural and functional features. 
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The promoter sequences recognized by these phage polymerases, such as T7, T3, Kl 1, SP6 
and BAM phage polymerases, all share a common 23 bp consensus sequence between 
nucleotides -17 to +6 (see e.g., McAllister, Cellular and Molecular Biology Research (1993) 
39: 385-391). Because of the structural and functional similarities of the RNA polymerase 
encoded by these bacteriophages those skilled in the art refer to these RNA polymerases as 
M T7-like RNA polymerases" and to the phages as "T7-like bacteriophages." (See e.g., 
Chamberlin et al. (copy submitted with June 5, 2001, Amendment) at page 88, first 
paragraph, and page 89, Heading 11). Thus, referring to the RNA polymerase as "T7-like 
phage polymerase" would clearly have been understood by one skilled in the art as referring 
to the RNA polymerases encoded by T7 and its related phages. 

6. In addition to recognizing this consensus sequence within the transcription 
promoter sequence, T7-like phage RNA polymerases also have a common organization. It is 
known in the art that these RNA polymerases consist of a single subunit (see e.g., Severinov, 
PNAS, (2000) 98: 5-7; Tahirov et al., Nature (2002) 420:43-50; and Yin et al., Science (2002) 
298: 1387-1395). These references show that there are two "families" of DNA-dependent 
RNA polymerases that are recognized in the art. One family of polymerases encompasses the 
T7-like phage polymerases, which consist of a single subunit, while the second family of 
RNA polymerases covers bacterial and eukaryotic RNA polymerases, which consist of 
multiple subunits. As described in the specification at, for example, page 4, line 30 to page 
53, the T7-like phage polymerases are an art-recognized class of very homologous enzymes. 
This is also supported by the discussion in Chamberlin et al., from page 89 to 91. 
Furthermore, Severinov, Tahirov et al. and Yin et al. further demonstrate and confirm the 
accuracy of this grouping of phage RNA polymerases. Thus, those skilled in the art would 
have recognized that the T7-like phage polymerases are a closely related group of RNA 
polymerases. 
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7. The ability to synthesize a polymer of nucleotides is conferred by the active 
site of the enzyme, which is highly conserved among the T7-like phage polymerases. An 
alignment of the RNA polymerases from exemplary T7, T3, SP6 and Kl 1 RNA polymerases 
is attached. The alignment shows that the amino acid sequence from residue 620 to about 
640 is highly conserved across the different types of T7-like phage polymerases. 

8. The conserved amino acid sequence in this region of the T7-like phage RNA 
polymerases would also have suggested to one skilled in the art that similar changes made in 
the phage polymerases would result in the same or similar mutant phenotypes. Thus, one 
skilled in the art would have reasonably expected that a mutation at R627 in the manner 
described in the specification with respect to T7 RNA polymerase would result in a similar 
mutant phenotype as that of the T7 RNA polymerase in other T7-like phage polymerases. It 
is a generally accepted and routine practice among those skilled in the art to compare the 
amino acid sequence of related proteins to localize areas of importance and interest. 

9. The skilled artisan would have determined an appropriate mutagenesis strategy 
based on the comparison of the amino acid sequences and structures. Thus, there would have 
been no need for the skilled artisan to examine multiple mutations at every possible position 
within the protein as asserted in the Office Action. The demonstration of one mutant of T7 
RNA polymerase activity within a highly conserved region of the amino acid sequence shared 
by the T7-like phage RNA polymerases would have been expected to yield similar results in 
other T7-like phage polymerases. Thus, no undue experimentation would have been 
necessary to practice the claimed invention with various alternative T7-like phage 
polymerases. 

10. The specification describes the modification of a T7 RNA polymerase at 
residue R627. As a result of this modification, the RNA dependent RNA polymerase activity 
of T7 RNA polymerase is greatly enhanced. As discussed above, this particular residue lies 

-3 - 




Application No. 09/402,131 

within the highly conserv ed region, between amino acid residues 620 to about 640, that is 
shared by the T7-like phage RNA polymerases. Thus, one skilled in the art would have 
expected that the same or similar modification in the highly conserved regions within a 
different, but related, T7- like phage RNA polymerase would also enhance the RNA 
dependent RNA polymerase activity in the related T7-like phage RNA polymerase. 

11. Thus, in view of the attached references, the specification as filed provides a 
fully enabling disclosure for the claimed invention. One skilled in the art would not have 
required further guidance or examples, nor would undue experimentation have been required 
to practice the claimed invention beyond what is disclosed in the specification. 

12. I hereby declare that all statements made herein of my own knowledge are 
true, and that all statements made on information and belief are believed to be true; and 
further that these statements were made with the knowledge that willful false statements and 
the like so made are punishable by fine and/or imprisonment under Section 1001 of Title 18 
of the United States Code, and that such willful false statements may jeopardize the validity 
of the application or any patent issuing therefrom. 

Date: 

William T. McAllister 
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Abstract — A consideration of the properties of a number of mutants of T7 RNA polymerase, together with 
emerging structural information (Sousa et al., 1993) allows an interpretation of the the mechanics of transcription 
by this relatively simple RNA polymerase. Evidence indicating features in common with other nucleotide polymer- 
ases (such as DNA polymerases and reverse transcriptases) is reviewed. 

Keywords — DNA polymerase, Reverse transcriptase, Promoter structure 



INTRODUCTION 

Unlike the multisubunit DNA-depcndent RNA poly- 
merases ( RNAPs) of cukaryotic cells and bacteria, the 
RNAPs that are encoded by bacteriophage T7 and its 
relatives consist of a single species of protein that is 
capable of accurate transcription in the absence of 
any apparent need for auxiliary transcription factors 
(Chamberlin and Ryan, 1 983). The striking simplicity 
of this transcription system makes it ideally suited for 
studies of RNA polymerase structure and function. 
The gene that encodes the phage RNAP has been 
cloned and may be overexpressed in bacterial cells, 
allowing genetic and biochemical manipulation of the 
enzyme (Davanloo et al. 1984). Importantly, T7 
RNAP has now been crystallized, and a number of 
mutants that are altered in the transcription cycle 
have been characterized (Bonner et al., 1992; Patra et 
al., 1992; Gross et al., 1993; Sousa et al., 1993). 

In our work, we have taken the approach of isolat- 
ing or engineering T7 RNAP mutants with denned 
biochemical defects and asking whether these defects 
can be correlated with structural information so as to 
interpret the mechanism of transcription. We have 
identified important functional domains in the 
RNAP, and have found that the phage RNAP exhibits 
interesting structural and functional homologies to 
other simple nucleotide polymerases, such as DNA 
polymerases and reverse transcriptases. Although no 



extended sequence homologies exist between the 
phage RNAPs and the multisubunit RNAPs, there 
are intriguing clues that suggest a relationship be- 
tween the phage enzymes and certain subunits of the 
more complex RNAPs. Studies of this class of RNAP 
will, therefore, contribute significantly to our under- 
standing of nucleotide polymerization. 

MATERIALS AND METHODS 

Transcription reactions 

Mutant RNAP have been previously described 
(Gross et al., 1993); the designation mjxxx indicates a 
linker insertion mutation that lies within or immedi- 
ately preceding codon xxx. All transcription reactions 
were carried out in a volume of 10 ^1 containing: 20 
mM Tris-HCl (pH 7.9), 8 mM MgCl 2 , 2 mM spermi- 
dine-HCl, 1 mM dithiothreitol, 0.5 mM each of ATP, 
GTP, CTP, and UTP (Pharmacia, Ultrapure), and 1 
^1 cell extract (Gross et al., 1993). The products were 
resolved by electrophoresis in 20% polyacrylamide 
gels followed by autoradiography (ibid). 

RESULTS 

Enzyme domains involved in promoter recognition 

T7 RNAP is the prototype of a class of single-sub- 
unit DNA-dependent RNAPs that includes the 
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RNAPs of related phages such as T3, SP6, and Kl 1, 
as well as the mitochondrial RNAPs, and potentially, 
a chloroplast RNAP (for review, see McAllister and 
Raskin, 1993). Although the other phage RNAPs are 
closely related by sequence homology to T7 RNAP, 
each phage RNAP exhibits its own characteristic speci- 
ficity. A comparison of the sequences of the phage 
promoters reveals a common 23 bp consensus se- 
quence that extends from - 17 to +6, with initiation 

at +1 (see Fig. 1). 

A variety of biochemical and genetic experiments 
support the notion of two functional domains in the 
promoter— a binding domain that extends from - 17 
to -6, and an initiation domain that extends from -6 
to +6(sce Fig. 2). All of the promoters share the same 
sequence from -6 to + 1 , indicating a common func- 
tion for this region of the promoter. However, the 
sequences of the phage promoters differ significantly 



in the region from -9 to - 1 2, suggesting that discrimi- 
nation of specific promoter types may rely upon dif- 
ferences in this region. Mutations in the binding re- 
gion have been observed to reduce the affinity of the 
polymerase for the promoter without having a great 
effect on the rate of initiation, whereas mutations in 
the initiation region have minor effects on the binding 
affinity but a greater effect on the rate of initiation 
(Chapman and Burgess, 1987; Chapman et al., 1988). 
Upon binding of the RNAP to the promoter, the 
DNA in the initiation region becomes melted open, as 
evidenced by a hypochromic shift and hypersensitiv- 
ity of the nontemplate (NT) strand in this region to 
attack by single-strand specific endonucleases (Muller 
et al., 1989; Osterman and Coleman, 1981). 

Base modification experiments indicate that a 
number of important contacts between the polymer- 
ase and the promoter are made within the major 
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planar view template, and nontcmplafc strands are indicated, important structural elements in the promoter '"elude: positions 
at which substitutions with modified bases affect the kinetics of initiation (Maslak et al., 1993; Schick and Martin, 
positions at which the sugar-phosphate backbone of the DNA is protected by polymerase binding as revealed by nydroxy-radical 
footDriniinR(Mullerct al., 1989); and positions at which ethylation of the phosphate or methylation of the bases interferes with 
poly merase binding ( Jorgcnsen ct al., 1 99 1 ). From these data, the contacts of the RNAP appear to involve major groove groups 
from -6 to - 12 and minor groove contacts in the Hanking regions on either side. Regions of the promoter that remain double 
stranded or are rendered partially single stranded during polymerase binding are indicated at the bottom (Osterm an and 
Coleman 198 r Muller et al., 1 989). Other important regions such as the upstream AT-rich region, the region that is involved in 
promoter discrimination by individual RNAPs, the region in which the promoter sequences are highly conserved (shared 
binding) and the initiation region, are indicated at the top. Similar data for the T3 and SP6 promoters and their RNAPs are 
indicated in the side panels. Graphics were kindly provided by Dr. Craig Martin (University of Massachusetts). 



groove and the flanking regions from bps - 12 to -9, 
and it has been shown that the primary determinants 
of T3 vs. T7 promoter specificity are the bps at posi- 
tions - 1 1 and - 10 (Jorgensen et al., 1991; Muller et 
al., 1989; Klement et al., 1990; Raskin et al., 1992). 
Substitution of these two bps in the T7 promoter with 
the corresponding bps found in the T3 promoter pre- 
vents recognition by T7 RNAP and simultaneously 
enables recognition by T3 RNAP (ibid). 



To localize the region of the phage RNAP that is 
responsible for discrimination of these base pairs, hy- 
brid T7/T3 RNAPs were constructed (Joho et al., 
1990). In this way, the specificity determinant was 
localized to an 80 amino acid interval between resi- 
dues 674 and 752. Within this interval the T7 and T3 
RNAP amino acid sequences differ at only 1 1 posi- 
tions. Site-directed mutagenesis of this region of the 
T7 RNAP indicates that a single amino acid is respon- 
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<sible for discrimination of the -10 and - 1 1 bps; when 
this residue (Asn) is substituted by the corresponding 
residue found in the T3 RNAP (Asp), the resulting 
mutant enzyme (T7-N748D) exhibits T3 promoter 
specificity, particularly for the bps found at -10 and 
-11 (Raskin et aL 1992). A consideration of the hier- 
archy of preference for each of the possible base pair 
combinations at -10 and -1 1 indicates that N748 
makes direct contacts with bases on the nontemplate 
strand in a bidentate configuration (Diaz et aL 1993; 
Raskin et aL, 1992). This interpretation is consistent 
with all of the genetic and biochemical data described 
above. 

Substitution of other amino acids at position 748 
has generated a collection of T7 RNAP mutants with 
altered specificities. Some of the mutant enzymes 
have specificities that correspond to those found in 
other phage RNAPs (e.g., the SP6 and Kl 1 RNAPs), 
but others exhibit novel specificities not previously 
observed (Raskin et aL 1993). The location of residue 
N748 within the crystal structure of T7 RNAP is 
within a putative DNA binding cleft, at a position that 
would tie approximately one helical turn (35 A) up- 
stream from what is believed to be the active site 
(Sousa et aL 1993); see Fig. 3, and discussion below). 
This information serves to orient the RNA polymer- 
ase with respect to the promoter such that the direc- 
tion of transcription along the template can be antici- 
pated. 

RNAP mutants blocked in other functions 

To identify mutations that might aficct other func- 
tions of the RNAP (catalysis, elongation, termination, 
etc.) we constructed 35 linker insertion mutants of T7 
RNAP in which a 6 bp linker (two amino acids) was 
placed at various positions in the RNAP gene (Gross 
et al„ 1993). These mutants were subjected to a vari- 
ety of biochemical assays designed to detect blocks in 
key steps in the transcription cycle. A number of mu- 
tants with interesting biochemical phenotypes were 
identified, some of which are described below. 

An additional region involved in promoter recognition 

Among the linker insertion mutants were a class of 
RNAPs that retain nonspecific catalytic activity (i.e., 
they are able to synthesize poly rGona poly dC tem- 
plate) but which have lost promoter-binding ability. 
Some of these mutations map near residue 748, as 
expected from the above discussion. However, other 
mutations map closer to the amino terminus of the 
protein. Two mutants in particular (/m 144 and 
ins\ 59, which consist of insertions within or before 



codons 144 and l59)arcof particular interest because 
they lie near a region of T7 RNAP that exhibits signifi- 
cant sequence homology to region 2.4 of the bacterial 
sigma factor (Gross et aL, 1993). This region of sigma 
factor is known to interact with base pairs in the - 10 
region of the Escherichia coli consensus promoter se- 
quence (Helman and Chamberlin, 1988; Daniels et 
aL, 1990;SiegeleetaL, 1989; Waldburger ctaL, 1990). 
In the crystal structure of T7 RNAP, this region is 
found within the DNA binding cleft, not far from the 
region defined by residue 748 (Fig. 3). Together, these 
two elements of the DNA binding cleft come in con- 
tact with the upstream region of the phage promoter, 
thus defining a sequence specific recognition clement. 
The homology of this region to sigma factor suggests 
that additional common sequence elements may be 
found between the phage RNAPs and the multisub- 
unit RNAPs. 

Active site mutants 

Another interesting class of mutants are tnose that 
retain promoter-binding activity, but have lost cata- 
lytic activity. Two interesting mutants within this 
class (ins64Q and /m'648) exhibit a characteristic de- 
fect in their ability to utilize double-stranded DNA 
templates but not single-stranded templates. For ex- 
ample, both of these enzymes exhibit significant activ- 
ity on dC or dl-dC templates, but no activity on a 
dG:dC template (Gross et aL, 1 993). We reasoned that 
the defect in these enzymes might lie in their inability 
to melt open the double stranded helix, or failure to 
maintain an association with the template strand dur- 
ing elongation. This was confirmed by the use of syn- 
thetic promoters in which the promoter was "pre- 
melted" by virtue of the fact that the nontemplate 
strand in the initiation region was missing; whereas 
the wild-type enzyme is capable of initiating tran- 
scription from a fully duplex promoter as well as the 
premelted promoter, the two mutant enzymes were 
capable only of initiation from the premelted pro- 
moter (Gross et aL, 1993). 

The interpretation of these results with regard to 
the structure of T7 RNAP relied upon a potential simi- 
larity between the phage RNAPs and other nucleotide 
polymerases that was first observed by Delarue et aL 
(Delarue et aL, 1990). These authors noted a homol- 
ogy in three sequence motifs (A-C) found in many 
nucleotide polymerases, including DNA polymerase 
and the single subunit DNA-dependent RNAPs. Two 
of these motifs (A and C) are also found in RNA-de- 
pendent RNA polymerases as well RNA-dependent 
DNA polymerases. In the structure of the Klenow 
fragment of £. coli DNA polymerase I (KF) these 
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Fm 3 Structure of T7 RNA polymerase. The schematic depicts T7 RNA polymerase looking into the DNA binding cleft; the 
\[\s of the cleft runs vertically, as indicated by the dashed arrow (adapted from Sousa et a!., 1993). Structural motifs that are 
com mon to other nucleotide polymerases and which define the active site are indicated by selective shading (motifs ; A , B, and L), 
as "Son that exhibits homology with sigma factor region 2.4. Key catalytic residues are indicated. Residue N748, which is 
Solved in contacts with the nontemplatc bases at - 10 and - 1 1 . lies on an extended loop at the base of the cleft. The distance 
from this residue to K63 1 , which may be crosslinked to the initiating nucleotide, is 35 A (approximately one turn ol the douoie 
helix). 



three regions are located near the active site (Ollis et 
aL, 1985). This finding, and the observation that mo- 
tif B differed in enzymes that utilize RNA vs. DNA as 
a template, led Delarue et al. to speculate that these 
polymerases may have evolved from a common pre- 
cursor (or may use similar structural motifs to carry 
out common catalytic functions), and that motif B is 
likely to be involved in association with the template 
strand. The two mutations of interest in T7 RNAP 
(z/w640 and ins64&) lie within motif B. The inability 
of these mutant enzymes to melt open promoters or 
to remain stably associated with the template strand 



following initiation is consistent with the proposal 
that motif B is in association with the template strand. 

Certain residues within motifs A, B, and C are 
highly conserved among all of the polymerases; these 
include, in particular, K631 in T7 RNAP, which lies 
in motif B. We and others have shown that this resi- 
due may be crosslinked to analogs of the initiating 
triphosphate, and that the crosslinked analog may 
subsequently serve as an acceptor in the formation of 
a phosphodiester bond with the next (incoming) nu- 
cleotide in a template-directed manner (Schaffner et 
aL, 1987; Maksimova et al., 1991). Residue K631 
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must, therefore, be near the acceptor site in the initia- 
tion complex, consistent with its proximity to the 
template strand in the model described above. 

More recent crystallographic data at higher resolu- 
tion show a close structural correspondence between 
T7 RNA and KF, especially in the regions now re- 
ferred to as the "poly merase-fold" (Sousa et al., 1 993). 
A similar structural correspondence has been noted 
for the HIV reverse transcriptase, lending further sup- 
port to the notion of a common catalytic mechanism 
for these enzymes (Kohlstaedt et al., 1992). 

DISCUSSION 

The convergence of genetic and biochemical ap- 
proaches, as well as the availability of a high resolu- 
tion crystal structure for T7 RNAP, make this a partic- 
ularly exciting time to study the structure and func- 
tion of an RNA polymerase. As a result of this and 
other work, considerable information is now avail- 
able concerning the regions of the RNAP that are in- 
volved in promoter recognition, transcript elonga- 
tion, and termination (for recent review, see (McAllis- 
ter and Raskin, 1993). There is a growing body of 
evidence that supports the existence of a common 
polymerase fold among the simple nucleotide poly- 
merases. This fold is likely to comprise the active site 
required for basic catalytic functions, and to contain 
elements that arc involved in template binding and 
positioning of the active site. Other functions that are 
unique to the particular type of polymerase (e.g., pro- 
moter recognition and binding for the RNA polymer- 
ases, proofreading, and exonuclease functions for the 
DNA polymerases) arc likely to be located elsewhere 
in the polymerase, possibly in auxiliary domains (see, 
for example, Fig. 3, in which the promoter recogni- 
tion site is spatially quite separate from the putative 
active site). 

What about the multisubunit RNA polymerases, 
do they also share homologies, or have they evolved 
along a different pathway? It is possible that as a result 
of the need to maximize the opportunity for regula- 
tion, multisubunit enzymes have distributed their 
corresponding functional motifs among multiple sub- 
units. Sequence alignment programs may be unable 
to detect highly divergent motifs that are distributed 
among many protein subunits. A more fruitful ap- 
proach may involve searching individual subunits for 
conserved motifs found in the phage-like RNA poly- 
merases. The potential alignment between sigma fac- 
tor and T7 RNAP suggests that this may prove to be 
an attractive method of analysis, although a func- 
tional role for this region of the phage RNAP must be 
confirmed. In any event, it is clear that studies of the 



structure and function of an elegantly simple RNAP 
like T7 will provide important clues for understand- 
ing the functioning of other polymerases. 
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Transcription, the first step of gene 
expression, is carried out by DNA- 
dependent RNA polymerases (RNAPs). 
RNAP catalyzes processive polymeriza- 
tion of RNA messages from NTP precur- 
sors by using one strand of DNA as a 
template. Although the reaction is similar 
to DNA polymerization (catalyzed by 
DNA polymerases), important differences 
exist. First, RNAPs are able to initiate 
synthesis of RNA from a nucleoside 
triphosphate, i.e., they do not require oli- 
gonucleotide primers. Second, the newly 
synthesized RNA chain is displaced nor- 
mally from the DNA template. Thus, dur- 
ing transcription, the DNA strands are 
separated only in a short region around 
the catalytic center of the enzyme, the 
so-called transcription bubble. Because 
Watson-Crick interactions are required 
for template-dependent RNA synthesis, a 
transient RNA-DNA hybrid should exist 
inside the transcription bubble. 

After promoter-complex formation, all 
RNAPs undergo abortive initiation-a cat- 
alytic synthesis of short, 2- to 8-nt-long, 
RNA oligomers that are rapidly synthe- 
sized and released from the complex. 
When the nascent RNA chain reaches a 
critical length of about 9 bases, it becomes 
stably associated with the transcription 
complex. RNAP then clears the promoter 
and elongates the nascent RNA chain in a 
fully processive manner. Despite its tight 
grip on nucleic acids, the elongating 
RNAP moves rapidly along the DNA and 
RNA chains until it encounters a termi- 
nation signal, which typically consists of an 
RNA hairpin followed by a run of uri- 
dines. At such sites, RNAP transiently 
pauses, the stability of the elongation 
complex suddenly decreases, and the en- 
zyme releases nucleic acids. The enzyme is 
now available to initiate transcription 
from promoters again. 

The molecular determinants of tran- 
scription-complex stability and processiv- 
ity are understood poorly. Several com- 
peting mechanistic models of RNAP 
function have been proposed in recent 
years. Much of the controversy centered 
around the length of RNA-DNA hybrid 
and its role (or lack thereof) in transcrip- 



tion. If the hybrid were relatively long, say, 
8-9 base pairs, then the relative instability 
of the initial transcribing complex and the 
complex paused at a terminator could be 
explained by suboptimal hybrid length 
(refs. 1 and 2; Fig. 1). Conversely, the 
establishment of a full-length hybrid could 
explain the stabilization of the nascent 
RNA in transcription complex during pro- 
moter clearance. In contrast, if the hybrid 
remains short (less than 3 base pairs) 
throughout elongation, then complex sta- 
bility should be determined primarily by 
the strength of the protein-nucleic acid 
interactions and/or conformational 
changes (refs. 2 and 3; Fig. 1). Establish- 
ment of the actual length of the hybrid 
might contribute also to our understand- 
ing of mechanisms of action of regulatory 
factors that modulate the rates of abortive 
initiation and promoter escape and the 
efficiency of transcription termination. 

RNAPs seem to have arisen twice in 
evolution. A large 
family of multi- 
subunit RNAPs 
includes bacterial 
enzymes, archeal 
enzymes, eukary- 
otic nuclear 
RNAPs, plastid- 
encoded chloro- 
plast RNAPs, and 
RNAPs from 
some eukaryotic 
viruses. Members 
of this family ex- 



Thus, if the nucleic acid scaffold plays 
an essential role in transcription, one 
would expect that the sizes of the 
transcription bubble and RNA-DNA 

hybrid would be similar in 
transcription complexes formed by 
enzymes of both classes. 



hibit extensive se- 
quence and structural similarities (4, 5), 
suggesting that the mechanism of tran- 
scription is conserved highly within this 
group. The RNAP from E. coli (subunit 
composition of the catalytic core 
aVjSp'a), molecular mass ~380 kDa; it 
requires the specificity a subunit to rec- 
ognize and melt promoter DNA) is the 
best-studied enzyme of this family. An 
unrelated family of single -subunit RNAPs 
includes enzymes from bacteriophages 
and mitochondria as well as nuclear- 
encoded RNAPs of chloroplasts. Mem- 
bers of the latter family are related also to 
DNA polymerases and to reverse tran- 



scriptases (6). RNAP from bacteriophage 
T7 (M r , ~100 kDa, does not require ad- 
ditional factors for promoter recognition) 
is the best-studied member of this family. 

Considerable evidence suggests that the 
hybrid length is ~8 base pairs during 
transcription by E. coli RNAP. For exam- 
ple, RNA-DNA crosslinking experiments 
have established that only 8-9 bases clos- 
est to the 3' end of the nascent RNA 
remain close to the template DNA strand 
in active elongation complex (7). Simi- 
larly, during initiation, the 5' end of the 
nascent RNA remains close to DNA for 
about 8-9 bases and then branches away 
(8). Studies of the effects of base-analogue 
substitutions that either strengthen or 
weaken Watson-Crick interactions on 
elongation complex stability and termina- 
tion efficiency also point to 8- to 9-bp 
hybrids (7). The upstream portion of the 
hybrid contributes the most to transcrip- 
tion complex stability (8, 9). Furthermore, 
a structural model of 
bacterial RNAP- 
elongation complex 
built by superimpos- 
ing the positions of 
protein-nucleic acid 
crosslinks onto a 
high-resolution 
structure neatly ac- 
commodates an 8-bp 
hybrid (10). 

Despite a lack of 
sequence similarity 
between the two 
RNAP families, the 
essential elements of the transcription cy- 
cle seem to be conserved (11). Thus, if the 
nucleic acid scaffold plays an essential role 
in transcription, one would expect that the 
sizes of the transcription bubble and 
RNA-DNA hybrid would be similar in 
transcription complexes formed by en- 
zymes of both classes. It was surprising, 
therefore, when the structure of T7 RNAP 
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complexed with a synthetic promoter and 
a trinucleotide RNA transcript revealed 
that only two RNA nucleotides closest to 
the catalytic center made Watson-crick 
interactions with the template strand of 
DNA (12). The 5 '-proximal base of RNA 
appeared to peel away from the template, 
suggesting that in this system, the hybrid 
may be as short as 2-3 base pairs. Struc- 
tural analysis suggested that (/) further 
extension of the hybrid would result in 
severe clashes with the T7 RNAP N- 
terminal domain, and (ii) * surface- 
exposed channel between the TJ RNAP 
thumb and the N-terminal domains is 
positioned appropriately to serve as an 
exit channel for the displaced RNA If the 
difference between the RNA-DN A hybrid 
length in £. coli and T7 RNAP complexes 
is real, it has important mechanistic im- 
plications. For example, although both E. 
coli and T7 enzymes recognize identical 
terminators (13, 14), the mechanisms in- 
volved must be different (Fig. 1). 

The ultimate way to resolve this impasse 
is to characterize structurally various in- 
termediates of transcription cycle formed 
by both types of RNAPs. In the absence of 
such data, protein-RNA crosslinking and 
molecular modeling can provide useful 
information. In a recent issue of PNAi>, 
Temiakov et al (15) used RNA-DNA and 
RNA-protein crosslinking to study tran- 
script elongation by T7 RNAP. Temiakov 
et al incorporated a crosslinkable ana- 
logue of UMP, U*, into defined positions 
of the nascent RNA of artificially stalled, 
active T7 RNAP elongation complexes. 
The flexible spacer arm between the de- 
rivatized nucleotide base and the 
crosslinking group is long enough to allow 



the crosslinker to reach the complemen- 
tary strand of a double-stranded nucleic 
acid. Thus, by following the appearance of 
RNA-DNA crosslinks between the deri- 
vatized nascent RNA and the template 
DNA strand, one can estimate the length 
of the RNA-DNA hybrid. The results are 
unambiguous and indicate that RNA- 
DNA crosslinks persist until the 
crosslinker is 9 base pairs upstream of the 
catalytic site (by convention, this position 
is referred to as -9). When the crosslinker 
is moved further away from the catalytic 
center, RNA-DNA crosslinks disappear, 
and RNA-protein crosslinks become 
prominent. 

The RNA-DNA crosslinking experi- 
ment suggests a hybrid that is longer than 
observed in the structure of initiating T7 
RNAP (12). To model the position of the 
hybrid in an elongating complex, Temia- 
kov et al (15) mapped the site of a 
crosslink between a short-range 
crosslinker incorporated into the nascent 
RNA 9 base pairs upstream of the cata- 
lytic center and RNAP. This critical posi- 
tion, where RNA is displaced from the 
hybrid, seems to contribute to elongation 
complex stability. By using a panel of 
chemical-mapping techniques in combina- 
tion with RNAP mutants, the crosslink 
site was localized to within 7 T7 RNAP 
amino acids in the so-called specificity 
loop (16). The specificity loop is a phage 
RNAP-specific feature, which in the open 
promoter complex recognizes the double- 
stranded DNA 10-12 base pairs upstream 
of the transcription initiation start point. 
Strikingly, 2 amino acids within the 7-aa 
fragment that harbors the crosslink site 
are known to contact the promoter at 



positions -7, -10, and -11 and are crit- 
ical in specific promoter recognition (16, 
17). Thus, it seems that during promoter 
clearance, the contacts between the spec- 
ificity loop and the upstream-promoter 
DNA are broken, and new contacts with 
RNA and possibly DNA at the upstream 
edge of the transcription bubble are es- 
tablished. The latter contacts may prevent 
collapse of the transcription bubble and 
thus stabilize RNA in the elongation 
complex. In addition, continued inter- 
actions between the specificity loop and 
DNA during elongation may contribute 
to sequence-specific pausing by T7 
RNAP (18). 

Localization of the position of ribonu- 
cleotide at the end of the hybrid allowed 
modeling of the overall position of the 
hybrid in the elongation complex (the 3 
end of the RNA in the hybrid is con- 
strained, because the position of the cat- 
alytic center of the enzyme is known; ref. 
16). The proposed trajectory results in few 
clashes, is consistent with much of the 
biochemical and genetic data, and is not 
consistent with the RNA-exit pathway 
suggested by the structure of the T7 
RNAP initiation complex (12). In an in- 
dependent study, Shen and Kang (19) 
mapped T7 RNAP-RNA crosslinks from 
either the derivatized 3 ' end of the nascent 
RNA or from the -9 position of the 
nascent RNA in several stalled elongation 
complexes. Their results are in agreement 
with those of Temiakov et al. (15). 

The picture of the T7 RNAP elongation 
complex that emerges from these studies 
is remarkably similar to our view of elon- 
gation complexes formed by multisubunit 
RNAPs. The RNA-DNA crosslinking re- 
sult is superimposable with that obtained 
by Nudler et al (7), who used the same 
experimental approach with E. coll 
RNAP. Thus, during elongation, the 
RNA-DNA hybrid appears to be 8-9 bp in 
length in both types of RNAP. In the T7 
RNAP elongation complex, the specificity 
loop is proposed to act as a wedge that 
both separates the nascent RNA from the 
DNA template and possibly maintains the 
upstream edge of the transcription bubble. 
In the structural model of bacterial RNAP 
elongation complex (10), an evolutionarily 
conserved feature of the largest subunit, 
the so-called "rudder," seems to play an 
analogous role. Interestingly, the rudder 
may also be involved in promoter recog- 
nition, because it is close to a region of the 
specificity subunit a that recognizes pro- 
moter positions -9/-12 (20). Site- 
directed deletion mutagenesis reveals that 
the rudder indeed contributes to bacterial 
RNAP elongation-complex stability by 
preventing the displacement of the nas- 
cent RNA by the nontemplate DNA 
strand (K. Kuznedelou, N. Korzheva, A. 
Mustaev, and K.S., unpublished observa- 
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tions). Unfortunately, similar experiments 
to demonstrate the role of specificity loop 
in T7 RNAP elongation and complex sta- 
bility are complicated, because the speci- 
ficity loop is required strictly for promoter 
recognition. 

The inconsistency between the struc- 
ture of the T7 RNAP initiation complex 
obtained by x-ray crystallography and the 
view that emerges from biochemical stud- 
ies could be explained by large conforma- 
tional changes that may occur during the 
transition from an unstable-initiation 
complex to a stable-elongation complex. 
A more likely explanation is that in the 
crystal structure, T7 RNAP had been 
captured in an act of unproductive syn- 
thesis. When the initial transcribed se- 
quence codes for three Gs in RNA, T7 
RNAP can synthesize long chains of 
poly(G) through repetitive cycles of slip- 
page of the nascent RNA along the tem- 
plate strand and addition of the ne:ct GMP 
(21). During such reiterative synthesis, 
RNAP does not leave the promoter, and 
naturally the hybrid can be only 3 bp or 
less. The addition of the nucleoside 
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triphosphate specified by the fourth posi- 
tion of the template (provided it does not 
code for a G) inhibits slippage and leads to 
productive initiation. The crystals of tran- 
scribing T7 RNAP complex were obtained 
in the presence of GTP and chain- 
terminating a,/3-methylene-ATP. Thus, 
the expected RNA product should have 
been GGGA. However, the adenosine nu- 
cleotide is not present in the structure, and 
subsequent biochemical data indicate that 
the ATP analogue has no effect on the 
slippage reaction (22). Thus, the exit path- 
way suggested by the early peeling RNA 
could be that of a slipped transcript and 
may not be used during productive 
elongation. 

Multisubunit RNAPs also undergo 
transcript slippage, and the strength of 
some promoters is regulated by variation 
of productive vs. unproductive, reiterative 
initiation events (23). As is the case with 
T7 RNAP, the slippage reaction in E. coli 
occurs at the end of a run of three or more 
identical base pairs in the initial tran- 
scribed sequence. Thus, during slippage, 
the RNA-DN A hybrid is short, suggesting 
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that the RNA-exit pathway of the complex 
engaged in the slippage synthesis is differ- 
ent from that in the productive complex 
(10). Interestingly, in bacterial RNAP a 
surface-exposed channel between the two 
domains of the second largest subunit 
exists (4) that is positioned analogously to 
the putative RNA-exit channel seen on 
the T7 RNAP transcribing complex (12). 
Consistent with this idea, in bacterial 
RNAP, mutations that alter the position 
of one of the second largest subunit do- 
mains or change residues that lie between 
the two domains result in dramatic 
changes in the efficiency of reiterative 
synthesis (24, 25). Future comparative 
structural analysis of transcription inter- 
mediates, in conjunction with biochemical 
and genetic analyses, should determine 
the true extent of functional convergence 
that nature has come up with and solve an 
identical problem of transcribing a de- 
fined fragment of DN A with two different 
protein machines. 
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Structure of a T7 RNA polymerase 
elongation complex at 2.9 A resolution 
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Gene expression in all organisms requires messenger RNA synthesis 
by DNA-dependent RNA polymerase (RNAP). These enzymes can 
be divided into two classes: multisubunit (bacteria, archaea, eukary- 
otes) and single-subunit (some bacteriophages, mitochondria, 
chloropksts) RNAPs. Although they share no apparent sequence 
or structural homology, the RNAPs of both classes carry out the 
basic steps of transcription in an identical manner . To initiate RNA 
synthesis, the enzyme must bind to a specific promoter DNA 
sequence that lies upstream of the start site for transcription, 
separate (melt) the two strands of the DNA in the vicinity of the 
start site (forming a transcription bubble), and begin RNA synthesis 
using the coding strand of the downstream DNA as a template and a 
sinde ribonucleotide as a primer. During the early stages of 
transcription, contacts between the RNAP and the upstream pro- 
moter sequences are maintained while the active site translocates 
downstream, resulting in the formation of a short RNA-DNA 
hybrid and extension of the transcription bubble .The 1/ 
RNAP initiation complex (IC) is unstable and repeatedly releases 
short (3-8 nucleotides) abortive RNA products before it undergoes 
a transition to form a stable elongation complex (EC) . The 
transition starts when the RNA-DNA hybrid reaches a length of 
8-9 base pairs (bp) and results in promoter release, collapse oi the 
melted promoter region, and displacement of the 5 end I of the 
nascent RNA 4 * 7 . During elongation the length of the RNA-DNA 
hybrid is maintained at 7-8 bp and the transcription bubble closes 
just after the RNA chain peels away from the DNA template . 

The solution of several RNAP structures has resulted in a break- 
through in our understanding of the functional aspects of tran- 
scription"- 20 . The structures of T7 RNAP-nucleic acid complexes 
revealed that promoter binding involves three principal structural 
motifs of the RNAP: an (A + T)-rich recognition loop interacts 
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with the - 17 region; a specificity loop makes base-specific contacts 
around -9; and an intercalation loop involving Val 237 facilitates 
promoter melting and stabilizes the upstream edge of the separated 
transcription bubble between -5 and -4 (the start site for tran- 
scription is designated as +1). The structure of an early IC in which 
the first three bases of the template strand have been transcribed is 
essentially unchanged from that of the binary promoter complex, 
indicating that the initial stages of transcription may be achieved 
without major changes in enzyme structure. However, the structure 
of the IC did not allow space in the active site for an RNA-DNA 
hybrid greater than 3 bp, and therefore did not provide a plausible 
mechanism for further transcription progress. The crystal structure 
of the T7 RNAP elongation complex reported here reveals that 
incorporation of an 8-bp-long RNA-DNA hybrid into the active site 
results in unprecedented structural alterations of protein structure, 
thereby resolving the principal differences between previous bio- 
chemical and structural data* 14 ' 21 . Although the conformation of 
the carboxy-terminal portion of T7 RNAP (which includes the 
active site) resembles that of polymerase I-like DNA poly- 
merases 11,22,23 , the overall organization of the EC is remarkably 
similar to that of the multisubunit RNAPs, reflecting the common 
functions of these two classes of RNAPs. 

Structure determination and overall structure 

Elongation complexes were formed and m crystallized as 
described 24,25 . The structure was refined at 2.9 A resolution to a 
final R = 23.5% and R ticc = 28.4% (Fig. la, b; see also Supplemen- 
tary Table 1). 

The RNAP in the EC has a shell-like, highly porous architecture 
(Fig. lc-e). The downstream DNA is bound in a deep groove and 
enters through a wide passage to a cavity that contains an 8-bp 
RNA-DNA hybrid. The axis of the hybrid is nearly perpendicular to 
the entering DNA, as in the yeast RNA polymerase II elongation 
complex 17 . The structure contains partly accessible channels, which 
presumably correspond to binding sites for the template and non- 
template DNA strands, and prominent pores for entry of the 
substrate and exit of the RNA product. The positive charge covering 
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almost the entire interior of the molecule extends through the pores 
and channels to the external surfoce. The channels and pores are 
features characteristic only to the EC. 

Protein structure 

During the formation of the EC, the N-terminal domain of the 
RNAP (residues 2-266) undergoes remarkable structural reorgan- 
ization as compared with the IC (Fig. 2). This involves a marked 
reorientation of a core subdomairi (72-151, 206-257), which 
remains unaltered, and an alternative folding of approximately 
130 residues. These changes result in the formation of three 
structural elements: an N-terminal extension (N-subdomain; 
residues 2-71), a central flap-like subdomain (152-205), and a 
C-terminal linker that connects the N-terminal domain to the 
C-terminal portion of the enzyme (C- linker; 258-266) (Fig. 2). 

Relative to the IC (Protein Data Bank accession number 1QLN 14 ; 
Fig. 2b, c), the core subdomain does not change its internal 
structure (root mean square deviation, r.m.s.d. = 0.7 A over 524 
main chain atoms) but moves as a rigid body (35 A translation and 
130° rotation) to its final position in the EC (Fig. 2a, c). This 
reorientation opens space in the active site to accommodate the 
expanding RNA-DNA hybrid. Of note, the core exhibits high 
intrinsic structural similarity (but lacks sequence homology) to a 
C-terminal region of T7 RNAP and to a lesser extent to the 
corresponding segment of pol Hike DNA polymerases (Taq 
DNAP; Protein Data Bank accession number 1TAQ 26 ) (Fig. 3). 



This suggests that the core subdomain evolved from duplication of a 
conserved structural element. 

In the IC (Fig. 2b, c), the N-subdomain does not interact with 
nucleic acids, and its fold consists of seven structural motifs: loop- 
helix-loop-helix-loop-helix-disordered region. In the EC (Fig. 2a, 
c), the N-subdomain adopts a more compact loop- helix- loop- 
helix-loop conformation that constitutes a portion of the binding 
site for the RNA-DNA hybrid. Despite the marked difference in 
folding, the N-subdomains in the IC and EC are located on the same 
side of the RNAP molecule, so that a short ot-helical fragment (31- 
43) coincides in both structures. This suggests that the N-subdo- 
main keeps a fixed orientation during the structural transition of the 
RNAP, and that all alterations in this domain result from alternative 
folding. 

A long, unfolded loop (165-205) intervenes between the N- and 
C-terminal portions of the core in the IC (Fig. 2b, c). In the EC 
(Fig. 2a, c) this loop is folded into the flap subdomain, which 
consists of a helix-turn-helix (HTH) motif followed by a C-terminal 
loop. In the EC, the flap subdomain spans the interval between the 
upstream end of the RNA-DNA hybrid and downstream DNA. 

Finally, whereas the short C-linker connecting the N-terminal 
domain to the C-terminal domain has a random coil conformation 
in the IC (Fig. 2b, c), it adopts an tx-helical conformation in the EC 
(Fig. 2a, c) and extends the C-terminal a -helix of the core sub- 
domain towards the end point of the N-terminal domain. 

The C-terminal, DNA polymerase- like portion of the RNAP, 




Figure 1 The T7 RNAP EC crystal structure. The RNA (fightyeMow), DNA template (red) and (b) is superimposed on the atomic model. The rest of the protein is represented by a 

non-template (blue) strands are shown as a 'bail-and-stick' model, a, b, The \F Q - F c | ribbon diagram, c-e, Three views of the EC. The protein surface is coloured according to 

omit electron density map (3.2o contour level; green) produced for the RNA-DNA hybrid the electrostatic potentials (positive, negative and neutral are dark blue, red and white, 

(a) and tor the long N-terminal a-helix (residues 31-62, ball-and-stick model; magenta) respectively). 



2 



© 2002 Nature Publishing GrpffpORE 1 9 OCTOBER 2002 |doi:10.l038/nature01 129 (www.nature.com/nature 



^HI^H^^^^^^v articles 

advance online publication - 



(2-71 dD2-2Cb) 



5^ 



N-subdomain 
(2-71) 



Transcript 



Transcript 



Curd 
ubdomain 
(72-151 
206-25 Ti 
/ 



DNA X 
non- temp ate ^ 
strand 

Flap/ 

(152-205) 



4 /^L ^ Core 

fv^CP 206 ~ 257 ' 

i 7r*.^/ 



prumotor Sf.ecncity locp ^V' / 
(739-772) ^ 



DNA ' 

non-template / 
strand p r0 motor speciticcy loop 
(730- 772i 




— ^^^^^mm — 1^— tw ~ 4jM ibi 181 171 _. 



. 221 231 241 



FjaureZtomparisontetweentheT?^^ 

early IC ft ribbon diagrams are shown in the same orientation wrth respect to the 
unaltered C-terminal portion (white). The view in a corresponds to that of Fig 1c, but 
rotated 180° around the horizontal axis. The luMcackta are shown as nbbon (phosphate 

which consists of three major domains designated as 'fingers', 'palm' 
and 'thumb" 1 - 12 , remains largely unaltered in the EC as compared 
with the IC (Fig. 2a, b) . The primary exception is the specificity oop 
( 739-772) '" 7,2a (Fig 2a, b). A substantial shortening or the loop 
hairpin (by about 1 3 A) is probably the result of oudooping of the 
C-terminal strand in the vicinity of residues 757-767, as indicated 
by weak but visible electron density in this region. Furthermore, the 
tip of the loop is buried in a hydrophobic cavity formed by the flap 
and core subdomains, and the region of the loop that is used for 



backbone) and ball-and-stick (bases) models, c, Amino acid sequence of the N-terminal 
domain; secondary structures for the EC and the early IC are shown above (cylinders 
«-helices; thin lines, coils; dashed lines, disordered regions). The colour scheme is the 
same as in a and b. 

base-specific promoter recognition in the IC is shifted. As a result of 
these changes, and reorientation of the core subdomain, the 
specificity loop becomes separated from the two elements m the 
core subdomain that were involved in promoter binding in the IC. 

There are two other noticeable alterations in the C-terminal 
portion of the enzyme. First, the loop at the tip of the ringers 
domain (593-612) moves towards the downstream DNA to iorm 
part of the DNA-binding site (Fig. lc, d). Second, the N-terminal 
end of the long a-helix in the thumb domain bends towards the 




Figure 3 Structural homdogy of the T7 MAP core subdomain. a-c, The T7 RNAP core 
(a) the C-terminal homologous region (b), and the homologous fragment of the Tag DNA 

^,90^ 20021^,0,03^ 



polymerase (c) are shown in the same orientation. The numbers of residues used for the 
superimposition are shown at the bottom. 
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fingers domain to complete the forma tion of the pore that leads to 
the active site (Fig. lc). 

Protein-nucleic acid interactions 

The position of the downstream DNA is determined largely by long- 
range electrostatic interactions (Fig. 4a) with parts of the fingers 
subdomain and with residues in the first turn of the flap subdomain. 
The DNA strands are separated just before the template strand 
enters the active site by interaction with the N-terminal part of a 
helix in the fingers subdomain (642-662), which we designate a 
'downstream DNA zipper'. DNA melting involves stacking of the 
template base at +2 on the Phe 644 side chain of the zipper, which 
appears to cause the base at +1 to flip out, thereby directing the 
template strand towards the active site (Fig. 4b). (Nucleotide 
positions in the EC are designated relative to the active site at 
+ 1.) The first unpaired nucleotide of the non-template strand (at 
4-1) is directed towards a deep groove formed by a loop (593-610) 
and a helix-turn- helix fragment (647-675) from the fingers sub- 
domain, part of the flap subdomain, and the N-terminal portion of 
the long ot-helix from the thumb subdomain (Fig. lc, d). This 
groove extends to the template strand exit located between the 
second turn of the flap subdomain and the C-terminus of the long 
helix of the N-terminal extension (Fig. lc). 

The 8 -bp RNA-DNA hybrid in the EC adopts an underwound A- 
form conformation of the double helix that is similar to the 



conformation of the RNA-DNA hybrid in the yeast RNA polymer- 
ase II EC (Fig. la; r.m.s.d. 0.6 A) 17 . The hybrid-binding cavity 
complements this shape and has a surface that consists of alternat- 
ing hydrophobic and positively charged residues, which may facili- 
tate translocation (Fig. 4a). The first 3 bp at the downstream end of 
the hybrid interact with the protein in a similar fashion as in the T7 
IC (Fig. 4a). However, the interactions of the 3 bp at the upstream 
end of the hybrid involve newly folded elements of the EC. Most 
notable are interactions of template strand nucleotides at positions 
-6, -7 and -8 with the C- terminal portion (50-60) of the long 
helix from the N-subdomain, and the RNA nucleotides at positions 
-6 and -7 with the flap subdomain (Fig. 4a). 

The upstream boundary of the RNA-DNA hybrid cavity is 
defined by the C-terminal part of a loop-helix motif (62-71) that 
connects the long helix of the N-subdomain with the core sub- 
domain. The Gin 58, Glu 63 and Asp 66 side chains in this loop 
project towards the plane of the base pair at the upstream end of the 
hybrid at a distance of about 7.3 A (Fig. 4a, c), which may allow 
hybrid extension by 1 bp without serious protein alterations. 

RNA exit and substrate entry pores 

In the EC, a highly positive pore leads from the upstream end of the 
RNA-DNA hybrid to the surface, and probably provides an RNA 
exit pathway 29,30 (Fig. le). The pore is formed by part of the 
reorganized specificity loop, the newly formed C-linker, and part 
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Rgure 4 Protein-nucleic acid interactions, a, Schematic drawing of the protein-nucleic c, Upstream end of the RNA-DNA hybrid. Colours are the same as in Fig. 2. The side 
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of the C-terminal portion of the enzyme (291-303). A final step in 
the transition from the IC to EC occurs when 10-14 nucleotides of 
RNA have been synthesized 31 ' 3 ', and may result from interaction, of 
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Figure 5 Model of the late IC. Two different views are shown in a and b. The protein is 
represented by the same electrostatic potential surface as in Rg. 1 c-e . The nucleic acids 
are shown as a combination of ribbon phosphate backbone and ball-and-stick models 
with the RNA and DNA template and non-template strands coloured in light yellow, red 
and light blue respectively, c, Interaction of the protein domains that have principal roles 
in the structural organization of the late IC (dashed arrows) or the EC (solid arrows), with 
the components of the transcription bubble shown schematically. The colour scheme is 
the same as in Rg. 2. 
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the displaced RNA with the surface of the pore, thereby stabilizing 
the transcription bubble. 

Another pore provides access to the active site, and is probably 
the pathway for entering substrate (Fig. 1c). Although this opening 
is also present in the IC, it becomes more accessible in the EC as ^a 
result of a 12° bend of the thumb domain a-helix (372-4UV) 
(Fig Id). Substitutions of residues that line the surface of the 
pore* result in marked effects on catalytic activity 3 ^, which would be 
consistent with a role for this region in providing a path for entering 
substrate, or, owing to its proximity to the active site on catalysis 

The presence of these pores in the EC provides a clear parallel to 
the multisubunit RNAPs. However, whereas these exit and entry 
pathways are short in the T7 EC, they are considerably longer (about 
30 A) in the multisubunit RNAPs 1516 . This 

in part, for the high rate of polymerization exhibited by T7 RNAF 
(200 nucleotides per s for T7 compared with 30 nucleotides per s tor 
Escherichia coli RNAP) 1 . 

The transition from an IC to an EC 

Biochemical experiments have shown that during ^^J^ 0 / 
initiation, just before promoter release, the length of the RNA-DNA 
hybrid is about 8bp, as is observed in the EC 7 . It therefore seems 
likely that the conformation of the RNAP in the late IC resembles 
that of the EC. A number of questions arise concerning the 
transition from an early IC to a late IC. Does the transition occur 
in one step, or is it a multistep process? Furthermore, how are the 
upstream promoter contacts maintained while the hybrid grows to 

8 Modelling of the RNA-DNA hybrid as it is extended from 3 bp in 
the early IC indicates that the first clash of the hybrid would occur 
with elements of the core domain at 4 bp, but that rotation of the 
core domain, together with specificity loop and promoter, could 
accommodate extension of the hybrid to 5bp without substanUal 
refolding (consistent with biochemical data 2 - 37 ). After 5 bp, how- 
ever, the structure of the EC shows that the hybrid is greatly 
stabilized by interactions with the refolded elements of the N- 
terminal domain (Fig. 4a), suggesting that reorganization starts at 
this point. This is consistent with the observation that mutations 
affecting the elements in the core domain that interact with the 
hybrid at this length inhibit extension beyond 5 nucleotides • . lne 
(A + T)-rich recognition and the Val 237 intercalation loops are 
part of the unaltered core subdomain, and the promoter contacts 
mediated by these elements may be retained during the reorienta- 
tion of this subdomain (assuming that the upstream DNA moves 
along with the core) and might persist up to 8-9 bp. On the other 
hand, modelling suggests that this reorganization cannot occur 
without loss of contacts between the specificity loop and the 
promoter (Fig. 2a). This is consistent with the results of ultraviolet 
laser crosslinking experiments, which show that promoter contacts 
at -7 and -8 are lost before the contact at -17 (ref. 5). 

The pattern described above suggests that a simple superposition 
of the IC and EC core subdomains could provide a plausible model 
for continued binding of the upstream promoter regioii i in the late 
IC (Fig 5). The model shows that a transcription bubble ot 9-10 
nucleotides (which corresponds to a hybrid length of 5-6 bp) would 
be too short to connect the promoter in the late IC with the hybrid 
and downstream DNA (not shown). Thus, the rearrangement that is 
proposed to commence at 6 bp cannot occur as a single step, and is 
probably a multistep process in which gradual reorganization of the 
N- terminal domain is accompanied by incremental movement of its 
core The 13-14 nucleotide length of the bubble in the late IC (which 
corresponds to a hybrid of 8-9 bp, and is in agreement with recent 
fluorescence experiments 7 ) suggests a probable pathway for the 
template and non-template strands that connect the upstream 
region of the promoter with the hybrid and downstream DNA 
(Fig. 5). A 25A-long positively charged groove, formed by the 
thumb domain a-helix and the flap subdomain, is likely to form a 



NATURE j 9 OCTOBER 2002 1 doi:l0.1038/natureOll29 1 www.nature.com/r©ifi002 Nature Publishing Group 




articles 



binding site for approximately 6-8 nucleotides of the downstream 
non-ternplate DNA strand'" 0 (Fig. 5). At the upstream edge of the 
transcription bubble there are two hydrophobic cavrties that can 
accommodate 4 bases of template and 4-5 bases of non-template 
strand DNA (Fig. 5). The central wall separatmg these two cavities 
(the separation loop) is formed by a portion of the core _ domain 
(128-133), whereas the remaining portions are formed by newly 
formed structural elements. If the movement of the core domain is 
fixed at this point, further extension of the RNA-DNA hybrid would 
result in promoter release as a consequence of the strain associated 
with extension, and/or as a result of compression and extrusion of 
the template and non-template strands from the hydrophobic 
channels. Biochemical data suggest that promoter re ease may 
precede the final collapse of the transcription bubble and displace- 
ment of the 5' end of the RNA from the hybrid 7 . Interaction of the 
separation loop with the template and non-template strands just 
downstream of the promoter may prevent complete collapse of the 
bubble until later in the initiation process. In the model, the non- 
template base at -1 1, which would later re-anneal to the template 
strand to form the upstream edge of the bubble after promoter 
release, is on the protein surface in close proximity to the upstream 
edge of the RNA-DNA hybrid. This suggests that the final transition 
to the EC may occur without further major alterations of protein 
structure. 
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Structural organization of the EC 

The EC structure, in combination with modelled promoter DNA 
and a plausible trace of the template and non-template strands, 
comprises a transcription intermediate that corresponds to the late 
IC Although the modelled nucleic acids are not suitable for detailed 
analysis of the interactions, they allow a discussion of the role and 
significance of new structural elements that are acquired by the 
RNAP in the course of refolding. 

The EC structure shows that the rearranged segments ot the 
N-terminal domain (N- and flap-subdomains, and C-linker) are 
principal elements of the complex. These motifs, which do not have 
an essential functional role in the early IC, become multi-functional 
in the late IC and in the EC, and are extensively involved in 
interactions with various components of the transcription bubble 

(F The N terminus of the N-subdom;iin probably provides a path- 
way for the template and non-template DNA strands in both the late 
IC and the EC. The long a-helix of the N-subdomain interacts with 
the RNA-DNA hybrid in the EC and may also contribute to the 
formation of the upstream template and non-template DNA-bind- 
ing sites in the late IC (Fig. 5c). Importantly, the C-terminal linker 
region of the N-subdomain is in close proximity to the upstream 
boundary of the hybrid in the EC, and extension of the hybrid by 
1 bp would clash with the hydrophilic: residues clustered at the tip ot 
this loop (Fig. 4a, c). The loop may therefore act as a hybrid zipper 
to facilitate strand separation of the hybrid. In bacterial RNAPs the 
a-subunit 'destabilized loop, which is also likely to interact with the 
upstream edge of the hybrid, contains two highly conserved acidic 
residues at the tip that may have essentially the same role . 

Tn the EC structure, the flap subdomain interacts both with the 
upstream end of the hybrid and with the downstream DNA. In the 
late IC, it is probably involved in binding of the upstream portion ot 
the template strand. The flap subdomain also forms part of a 
potential non-template binding site in both the late IC and the 
EC (Fie. 5c). Finally, in the EC, this subdomain is required to fix the 
position of the specificity loop (Fig. 2a), which is indispensable for 
the formation of the RNA exit channel. Thus, the flap subdomain 
seems to be of central importance to the integrity and function ot 
the late IC and the EC. Consistent >vith this, mutations witrnnthis 
motif result in a failure to resolve die transcription bubble • . 

Although very short (8 residues), the C-linker also probably 
interacts with two distinct elements of the transcription bubble: 



with the upstream DNA template strand in the late IC and with the 
displaced RNA product in the RNA exit pore (Fig. 5c). 

Other than the N-terminal domain, the principal element whose 
functional purpose is switched during the transition to an EC is the 
specificity loop. While losing its role in promoter recognition it 
forms a portion of the RNA exit channel and is probably involved m 
interactions with the displaced RNA product. The flexible out- 
looped segment of the loop, which is rich in hydrophilic residues, 
may intrude into the RNA exit pore. We speculate that this segment 
may be important for responding to pausing and termination 
signals. The tip of the specificity loop may also interact with the 
DNA template strand in the late IC (Fig. 5c). 

Abortive cycling 

Four structures of T7 RNA have been determined previously: free 
enzyme; RNAP bound to the transcription inhibitor T7 lysozyme; a 
binary RNAP-promoter complex; and an early IC 11 " 14 - All of these 
structures display very similar conformation of the enzyme (Fig. 2b) , 
which implies that this conformation probably represents the most 
stable organization of the protein. In contrast, the conformation of 
RNAP in the EC (Fig. 2a) has never been observed before. The 
extensive protein-nucleic acid interface (about 2,500 A ) in the EC, 
which is lacking in the other structures, suggests that the protein 
conformation of the EC is greatly stabilized by these interactions 
During the transition to an EC the exposed hydrophobic surface of 
the N-terminal domain increases by roughly 1,400 A , which, in the 
absence of the nucleic acid-protein interface, may favour a return to 
the more stable IC conformation. Given the probable multistep 
nature of the transition from the early IC to the late IC, it seems 
probable that the intermediate enzyme conformations would be less 
stable than the one observed in the EC, because they lack the full 
extent of the final nucleic acid interface. In addition, modelling 
shows that the principal N-subdomain and C-linker ot-helices can 
only be gradually folded during the course of the transition, and that 
the unfolded IC loop, comprising the flap domain in the EC, would 
lose its stabilizing interactions with the protein during the initial 
stage of the transformation. We speculate that competition between 
the nascent RNA-DNA hybrid for space in the active site (which 
induces the reorganization of the RNAP) against the tendency of the 
partially folded protein structures to fold back to the initial 
conformation is a major factor contributing to abortive cycling. A 
similar mechanism has been proposed for bacterial multisubunit 
enzymes, in which the growing RNA-DNA hybrid competes with 
initiation foctor-o resulting either in abortive RNA synthesis or 
displacement of a to form a stable EC 18 ' 19 . 



Discussion 

On the basis of footprinting experiments, two principal models 
were proposed for the progress of transcription initiation: 'poly- 
merase inchworming, in which the RNAP structure extends to 
include the expanding transcription bubble, and 'DNA scrunching*, 
in which the RNAP structure remains unaltered while the bubble is 
accumulated in the active site. The interpretation of previous 
structural data favoured the scrunching model, in which the 
RNA-DNA hybrid length was limited to 3 bp and the strain 
associated with packing of the transcription bubble in the unaltered 
enzyme led to abortive initiation 14 . 

Although not excluding a role for scrunching during the early 
stages of transcription (when the hybrid is <4 bp), the T7 RNAP EC 
structure reported here provides strong evidence in favour of 
polymerase inchworming during the later stages of initiation, as 
the structure of the RNAP is expected to change progressively to 
accommodate the RNA-DNA hybrid as it grows to a final length of 
about 8-9 bp. , 

During the maturation of the EC, the elaboration of the com- 
ponents of the transcription bubble both induce and stabilize the 
reorganization of the protein structure. We suggest that a similar 
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role for nucleic acids may also be expected for other multifunctional 
nucleic-acid-binding proteins, including the multisubunit RNAPs 
during their transition from an IC to an EC. 

The overall organization of the 17 RNAP EC bears a marked 
resemblance to that of the multi;;ubunit RNAPs. Moreover, the 
structural mechanism by which T7 RNAP achieves this final state is 
similar to the steps that are observed in bacterial RNAPs. In both 
cases the active site that accommodates the components of the 
mature transcription bubble is greatly reduced at the beginning 
stages of transcription, and the FlNA exit channel is either not 
formed or is blocked. These elements probably form gradually in a 
process that involves competition b etween the growing RNA-DNA 
hybrid and protein structural elements. The reorganization 
observed in T7 RNAP therefore resembles the displacement and 
release of the bacterial o -factor, and we suggest that a similar 
situation may exist in the multisubunit eukaryotic RNAPs. The 
induced reorganization of the enzyme as it matures and makes the 
transition to an EC is therefore probably a consistent theme among 
DNA-dependent RNAPs. □ 

Methods 

Crystallization and data collection 

Stable, transcriptionally competent complexes were formed by mixing RNAP at a 1:1 
molar ratio with a nucleic acid scaffold consisting of an 18-nucleotide template DNA 
strand annealed to 10 nucleotides of downstre£im DNA and a 12-nucleotide RNA primer, 
of which 4 nucleotides at the 5' end were unpaired with the template. Such complexes 
exhibit all of the properties of a promoter-initiated EC 24 . 

Crystallization was carried out by the sitting-drop vapour diffusion technique at 20 C C 
(ref. 25). The drop, containing 2 pi of 10 mg ml" 1 complex solution, was mixed with 2 pi of 
well solution containing 10% PEG 8000, 10% glycerol, 5 mM 3-mercaptoethanol, 100 mM 
Tris buffer, pH 8.1. Thin, plate-like crystals (0.5 X 0.5 X 0.05 mm) were grown within 3-5 
days and diffracted to 2.6 A resolution using synchrotron X-ray radiation. The T7 RNAP 
EC crystals belong to the PI space group with unit cell dimensions a = 79.91, b ~ 84.97, 
c ~ 202 A, a - 90.36, 0 = 92.97, y - 109.94°. A complete diffraction data set was 
collected at 100 K on SPring-8 beam line BL45XU, and processed with the HKL2000 
program package 43 (Supplementary Table 1). 

Structure determination and refinement 

The structure was determined by molecular replacement using the coordinates of the T7 
RNAP IC (Protein Data Bank accession number 1QLN 14 ) as a search model* 4 . Clear 
molecular replacement solutions for four crystailographically independent molecules 
were obtained only after removal of the N-tenninal domain (266 residues) and all 
protruding segments of the C-terminal domain from the initial model, which indicated 
large structural alterations of the omitted portions of the molecule 23 . The phases obtained 
from the initial structure were improved by density modification and non- 
crystallographic symmetry (NCS) averaging tiding the DM program* 4 , resulting in an 
interpretable electron density for the omitted N -terminal domain and the RNA-DNA 
hybrid. The atomic model was built on the basis of this density using the O program 43 and 
then refined by the CNS program 46 . At thus stage, however, the refinement was unstable, 
resulting in large portions of density that disappeared from the (2i\, — F c ) electron density 
maps after additional refinement cycles. Careful inspection of diffraction data revealed 
barely separated twin spots at medium resolution, which become more profound at high 
resolution. This was characteristic of all crystal* examined. To avoid this problem, the size 
of integration spots was increased at the stage of data processing to include both portions 
of the spots twinned at high resolution, and the diffraction intensities were calculated 
using summation rather than profile fitting to allow for better evaluation of the intensities 
for the two peaks in the integrated area. The four-fold NCS averaging appeared to be a 
powerful tool for the evaluation of the quality of the diffraction data (see Supplementary 
Information section 'NCS averaging'). The reprocessed data at 2.9 A resolution resulted in 
clear electron density, which allowed for unambiguous modelling of the missing regions of 
the molecules with almost all protein side chains. The four independent molecules in the 
asymmetric unit are almost identical, with r.m.s.d. between the molecules ranging from 

0. 77 to LI 2 A for the protein main chain and nudek acid atoms. The downstream DNA is 
flexible in two molecules, and two regions of the protein were not resolved (757-765 and 
352-371). Only weak electron density was visible for the four unpaired nucleotides 
(AACU) in the 5' region of the RNA and for the last base pair of the downstream DNA 
(G»G position +10). Figures la, b, 2a, b, 3 and 4b, c, were prepared using the programs 
Molscript 47 , Bobscript 48 and Raster3D' w . Figures lc--e and 5 were prepared using the 
GRASP program 30 . 
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To make mRNA transcripts, bacteriophage T7 RNA 
polymerase (T7 RNAP) undergoes a transition from an 
initiation phase, which only makes short RNA *'W™«s. 
to a stable elongation phase. We have determined at 2.1 A 
resolution the crystal structure of a T7 RNAP elongation 
complex with 30 base pairs of duplex DNA containing a 
"transcription bubble" interacting with a 17 nucleotide 
RNA transcript The transition from an initiation to an 
elongation complex is accompanied by a major refolding 
of the N-terminal 300 residues. This results m loss of the 
promoter binding site, facilitating promoter clearance, 
and creates a tunnel that surrounds the RNA transcript 
after it peels off a 7 base pair heteroduplex. Formation of 
the exit tunnel explains the enhanced processivity of the 
elongation complex. Downstream duplex DNA binds to 
the fingers domain and its orientation relative to 
upstream DNA in the initiation complex implies an 
unwinding that could facilitate formation of the open 

Ste^cuSdifferences, the 99 kDa single subunit RN A 
polymerase from the bacteriophage T7 (T7RNAP) and the 
multi-submit cellular RNA polymerases share numerous 
functional characteristics. Both families of RNA polymerases 
have initiation and elongation phases of transcription (7). 
During the initiation phase, RNAP binds to a specific DNA 
promoter, opens the duplex at the transcription start site and 
initiates RNA synthesis de novo. Transcription during this 
phase is unstable and characterized by repeated abortive 
initiation events that produce short RNA fragments (2-6 
nucleotides) (2, 3). After synthesis of 10 to 12 nucleotide long 
RNA the polymerase enters the elongation phase and 
completes transcription of the mRNA processively without 
dissociating until termination. There are significant 
biochemical differences between the initiation and elongation 
states. Footprinting assays show differences in DNA 
protection (4-6). The T7RNAP-DNA complex is 
significantly more stable in the elongation phase (3, 7) and 
T71ysozvme,ahaturdinhibitC)rofT7RNAP,mhibitsthe 

transition from the initiation to elongation state, but has little 
effect on the activity of the transcribing elongation complex 
(8). Here, we describe the structural basis for promoter 
opening , the transition from the abortive initiation to 
processive elongation phases, promoter clearance, the 
regulation by T7 lysozyme and ihe unwinding of downstream 
DNA. 



The structure of T7RNAP was largely unchanged when 
complexed either with the transcription inhibitor T7 lysozyme 
(9) a 17 base pair open promoter DNA (Iff) or a 17 base pair 
promoter containing a 5' template extension of 5 nucleotides 
and a 3 nucleotide RNA transcript (11); the C-terminal two 
thirds of T7 RNAP is homologous to the polymerase domain 
of the pol 1 family DNA polymerase (12-15) while a novel 
N-terminal domain (residues 1-325) is unique to the RNA 
polymerase. The structures show that one antiparaUel p" loop, 
named the specificity loop (residues 740-770), makes 
sequence-specific contacts with the promoter while another, 
the intercalating hairpin (residues 230-240), opens the 
upstream end of the transcription bubble (10, 11). The 
structures of the two promoter containing complexes also 
provide support for the 'scrunching' model of transcription 
initiation in which RNA synthesis leads to an accumulation of 
the DNA template within the active site before the promoter 

is released (11). .... 

Nevertheless, the structure of the transcribing initiation 
complex could not explain many aspects of the elongation 
phase. Extensive proteolysis that results in loss of the N- 
terminal 180 residue fragment abolishes the elongation phase; 
although the polymerase can initiate transcription from a 
promoter, it makes only abortive transcripts (3, 16). A 
proteolytic cleavage of T7RNAP after residue 173 or 180 
results in somewhat decreased efficiency of elongation and 
decreased single-strand RNA binding, which suggests that the 
integrity of the region between residues 170-180 plays a role 
in elongation (3). However, since the proteolyticaUy mcked 
region is located at least 40 A away from the 5' end of the 
RNA transcript in the initiation complex structure, it was not 
clear how this remote site could affect elongation 
transcription. Similarly, a single mutation, E148A, which 
abolished synthesis of any transcript longer than 5 nucleotides 
(17), is located at least 35 A away from the 5' end of the 
mRNA in the initiation complex structures. 

The most puzzling paradox, however, arose from the 
apparent incompatibility of the biochemical and structural 
evidence for the maximum length of the DNA-RNA 
heteroduplex during transcription. The structure of the 
initiation complex contained only 3 base pairs of DNA-RNA 
heteroduplex, and indeed any extension of the heteroduplex 
beyond three base pairs was deemed to be sterically excluded 
by the protein structure (11). In contrast, numerous 
biochemical studies, including a recent crosslinking of the 
RNA and DNA strands, led to the conclusion that the length 
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of the DN A-RN A heteroduplex during elongation is 
approximately eight base pairs (6, J* 19). Various^tempts 
(19) to accommodate these data using the exisUng T7RNAP 
structures seemed implausible. We show here that these 
apparent conflicts arose because an important piece of the 
puzzle in understanding the transition from transcription 
initiation to elongation by T7RNAF was missing. 

The crystal structure of a T7RNAP complex trapped in a 
functional elongation mode with a transcription bubble of 
DN A and heteroduplex RN A seven nucleotides long shows 
that the N-terminal domain changes; conformation 
dramatically compared to the initiation complex. As a 
consequence of this change, the promoter binding site is 
destroyed and a channel that accommodates the heteroduplex 
in the active site and an exit tunnel through which RN A can 
pass are created. These structural features account for 
promoter clearance and processivity in the elongation phase. 

Structure of an elongation complex. The T7RNAP was 
co-crystallized with a DNA containing a transcription bubble 
and mRNA, a complex that mimics the elongation phase, and 
its structure was determined at 2.1 A resolution. The 30 base 
pair duplex DNA contains a central region of 1 1 non- 
complementary bases and a 17 nucleotide RNA that is 
complementary to the template for 10 nucleotides at its 3 end 
(Fig 1 A). The RNA of this substrate can be extended by the 
polymerase in a template-directed and processive manner in 
the absence of promoter (20), and it possesses other features 
of a normal promoter-initiated elongation complex, as seen in 
the earlier work of von Hippel and Daube(2i). The T7RNAF 
elongation complex was assembled by mixing the polymerase 
with the substrate after the three oligonucleotide strands of 
template DNA, non-template DNA and RNA had been 
annealed (22). The structure was determined by single 
wavelength anomalous diffraction using selenomethiomne- 
substituted T7RNAP and by molecular replacement using the 
polymerase domain of the T7RNAP initiation complex as a 
search model. The phases derived, from these two sources 
were weighted and combined. Density modification was 
applied to the initial electron density map calculated with 
combined phases to further reduce the phase errors and 
improve the map (Fig. IB) (23). The final refined model has 
an R factor of 24. 1% (R free = 27.5%). The data collection and 
refinement statistics are provided in Table 1 . 

The T7RNAP protein is seen bound to duplex DNA and its 
RNA transcript annealed to a central, opened section of DNA 
in the active site (Fig. IB). The active site is located in an 
enlarged channel bounded by the polymerase's fingers, thumb 
and palm domains of the C-terminus. Although the annealed 
construct contained ten nucleotides of complementary 
heteroduplex RNA-DNA by design, the elongation complex 
active site contains only seven base pairs of heteroduplex 
DNA-RNA. After seven base pairs, the 5' end of the mRNA 
is separated from the template by an a helix of the thumb 
domain and enters a positively charged tunnel in the protein 
while the template strand remains bound to the thumb 
domain The single-stranded, non-template DNA in the 
bubble is separate from the template DNA-RNA heteroduplex 
and makes sequence-independent contacts with the protein. 
The template and non-template strands merge to form duplex 



DNA at both the upstream and downstream ends of the 
bubble (Fig. IB). Only three nucleotides of DNA upstream of 
the DNA-RNA heteroduplex are visible in the electron 
density map. At the downstream end of the bubble the non- 
template DNA strand base pairs with the template strand at a 
position that is one nucleotide beyond the incoming 
nucleotide (n + J), and all 10 downstream base pairs are 
clearly visible in a positively charged cleft formed by the 
fingers domain. Overall, the 3' terminal 10 nucleotides of 
RNA as well as 22 nucleotides each of template and non- 
template DNA are visible in the electron density maps, while 
eight base pairs of upstream DNA duplex and the 5' terminal 
seven nucleotides of single stranded RNA are not (Fig. 1 A). 
The seven base pair RNA-DNA heteroduplex seen in this 
structure is in good agreement with the -8 base pair length 
derived by biochemical studies (6, 18, 19). The total of 22 
nucleotides of DNA seen in the structure also agrees with the 
22-24 nucleotides length of DNA protected by the 
polymerase in foot-printing experiments of an elongation 
complex (4, 6, 16). 

This elongation complex is analogous to the binary 
complexes of primer-template DNA with DNA polymerases; 
its primer terminus is located at the post-translocation 
position ready to accept an incoming nucleotide. In this 
complex, as with the corresponding DNA polymerase 
complexes (24), the base that is to form the template for the 
incoming nucleotide (n) lies in a pocket in the fingers 
domain, rather than in an orientation that would allow it to 
pair with the incoming nucleoside triphosphate. The position 
of the primer terminus relative to the palm domain also is 
identical to its position in the transcription initiation complex 
lacking the NTP (10). We have additionally determined the 
structure of an elongation complex after insertion of the 
nucleotide at the n position yielding an 8 base pair 
heteroduplex and insights into the mechanism of translocation 
(Yin and Steitz, in preparation). 

T7 RNAP structural transition from initiation to 
elongation, Comparison of the polymerase structure in the 
initiation complex with its structure in the elongation 
complex shows that portions of the enzyme, most notably the 
N-terminal domain, have undergone significant 
conformational changes that alter its shape and tertiary 
structure (Fig. 2). The structural changes seen in the N- 
terminal domain involve three different regions, each 
undergoing a different kind of conformational alteration: 1) 
Ri ffid body motion. Six a-helices, D,E,F,G and U rotate by 
140° and translate by 30 A as a rigid body from their location 
in the initiation complex (fig. 2F). The ends of helices F and 
G pack against the third base pair of heteroduplex m the 
initiation complex and must move to accommodate a longer 
heteroduplex. The six helices are repositioned into the region 
that is occupied by the promoter DNA in the initiation 
complex, thereby abolishing the interaction between 
T7RNAP and the promoter, and thus explaining promoter 
clearance. The intercalating hairpin, which opens the 
promoter in the initiation complex, moves and becomes 
disordered in the elongation complex, accounting for 
mutations in this loop that reduce the efficiency of initiation 
but not elongation (25). 2) Rvt^ion of an ct-helix. In a 
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conformational change that is strikingly reminiscent of one 
undergone by the flu virus hemagglutinin protein upon a pH 
change (26), helix CI becomes significantly elongated from 
22 A to 50 A by the stacking of helix C2 on top of CI and the 
refolding of the disordered loop between C2 and helix D to 
further elongate the C1-C2 helix (Fig. 2E). The elongated C- 
helix now protrudes into the region formerly occupied by the 
six-helix assembly in the initiation complex, implying that the 
two conformational changes are likely to be coordinated. 3) 
Formation of anti-parallel a-helices. . Perhaps the most 
dramatic and unprecedented conformational change involves 
residues 160 to 190 which not only extensively refold, but 
move about 70 A from one side of title polymerase to the other 
(Fig. 2G). This region refolds from a short helix and an 
extended loop into a pair of anti-parallel helices (Hl,H2/3) 
(Fig. 2A,B,F). The newly formed compact structure, named 
sub-domain H, forms part of the RN A-transcript exit tunnel 
and contacts the 5' end of the RNA transcript on one surface 
and the non-template DNA on the opposite surface. 

In contrast with the N-terminal domain, the C-terminal 
domain undergoes fewer structural changes (Fig. 3B). The 
thumb domain rotates by about 15° from its orientation in the 
initiation complex, and together with the sub-domain H, 
creates a binding cleft for the non-template strand DNA (Fig. 
3C,D). In the initiation complex, the specificity loop crosses 
the active site to make sequence specific interactions in the 
major groove of the promoter DNA (70, 77), and it lies in a 
position that blocks the path through which RNA exits in the 
elongation complex. In the elongation complex the specificity 
loop moves sideways to open the exit tunnel and become a 
part of it. The tip of the specificity loop, which contacts 
promoter DNA in the initiation complex, now contacts the 5' 
end of the RNA message in its passage through the tunnel 
(Fig. 3D,4), consistent with the observation that the RNA 
transcript can be photocross-linked to the specificity loop 
(79). This conformational change of the specificity loop may 
also be associated with promoter release. 

The mutation E148A, which lies; remote from substrates in 
the initiation complex, abolishes synthesis of transcripts 
longer than 5 nucleotides (77). This may be due to the 
inability of the mutant 148A to make interactions necessary 
to the structural transition of the specificity loop in forming 
an elongation complex. E148 stacks directly against M750 
and interacts indirectly with N748 at the tip of the specificity 
loop through R155 to bend the specificity loop toward the 5' 
end of the RNA. The mutant 148 A cannot make the 
interactions to secure the bending configuration of the 
specificity loop and therefore would affect the integrity of the 
exit tunnel. Consequently, both structural and biochemical 
studies agree on these dual functions of the specificity loop. 

This massive structural reorganization of the N-terminal 
domain upon formation of the elongation complex creates a 
tunnel through which the RNA can exit and a binding site for 
the single stranded non-template DNA of the transcription 
bubble from n- 7 to n. The newly formed exit tunnel, whose 
interior is positively charged, measures approximately 8 A in 
diameter and 20 A in length and is formed by the thumb 
domain, the specificity loop and sub-domain H (Fig. 3C,D 
and 4). After seven base pairs of heteroduplex, three 



nucleotides of RNA are separated from the DNA by the rim 
of the exit tunnel (residues 170-180) and the thumb domain 
(Fig. 3C). The single stranded 5'end of the RNA transcript is 
seen entering the exit tunnel (Fig. 4B,D). Model building 
suggests that the tunnel may accommodate five extended 
nucleotides, implying that an RNA transcript longer than 12 
nucleotides would emerge from the side of the tunnel 
opposite the active site. The exit tunnel contacts the RNA 
strictly through interactions with the phosphates of the sugar- 
phosphate backbone. Residues Arg756 and Gln754 from the 
specificity loop, as well as Asnl71 and Lysl72 on the sub- 
domain H are all within hydrogen bonding distance of 
phosphates at the 5' end of the single stranded mRNA (Fig. 
3D). 

The processivity of the elongation complex, in contrast to 
the initiation complex, could be explained by the appearance 
of the mRNA exit tunnel, which topologically surrounds 
transcripts longer than 10-12 nucleotides. Its effect on 
processivity is entirely analogous to that of the sliding clamp 
on DNA replication (27). Further, the stability of the 
elongation complex, compared to the abortive phase complex, 
is enhanced by the extensive interaction between the seven 
base pairs of heteroduplex and its binding site. That 
proteolytic cleavage of residues between 170 to 180 results in 
reduced elongation efficiency (5, 76), may be a consequence 
of a reduced integrity of domain H and stability of the tunnel 
in the elongation phase. Complete proteolytic digestion of the 
N-terminal 180 residues results in aT7RNAP capable of 
abortive synthesis and incapable of elongating transcripts 
beyond eight nucleotides (3, 75). As this enzyme would be 
missing helices D-G and subdomain H, it may be incapable of 
destroying the promoter binding site, which is required for 
clearance, and unable to form the RNA exit tunnel required to 
processive synthesis. 

During the transition from initiation to elongation, 
T7RNAP relinquishes its sequence-specific grasp of the 
promoter and begins translocation along DNA, a process 
often referred to as promoter clearance, which is achieved by 
the destruction of the promoter binding site and movement of 
the six helices by 30 A into the position formerly occupied by 
the promoter DNA (Fig. 2). Upstream DNA now binds in a 
sequence independent manner to a newly created cleft that is 
formed in part by the thumb domain and helix C2 (Figs. 2B, 
4B). These two upstream DNA binding sites are separated by 
at least 40 A and cannot be occupied simultaneously by 
upstream DNA since formation of one binding site dismantles 
the other. The DNA that is upstream of the transcription 
bubble and visible in our complex is not base paired due to 
the non-complementarity of the designed sequence. 
Presumably, the upstream DNA of complementary sequence 
would form duplex that would lie in the upstream channel. 

What causes the T7RNAP to undergo this significant 
conformational change and what stabilizes the elongation 
phase structure? Since the apo enzyme has essentially the 
same structure as the initiation complex structure (70, 77), it 
seems likely that the formation of the longer RNA:DNA 
heteroduplex is playing a significant role. We previously 
noted that it was not possible to elongate the heteroduplex 
beyond three base pairs in the initiation complex due to a 
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steric clash with helices F and G (11). Here, we observe that 
the three base pair heteroduplex in the initiation complex and 
the seven base pair heteroduplex in the elongation complex 
are bound and oriented identically on the polymerase active 
site (Fig. 3B). When the C a backbone atoms of the palm 
domains of the initiation and elongation complexes are 
superimposed, the first two base pairs of the initiation and 
elongation complexes superimpose with an RMSD of 0.34 A 
(Fig. 3B). Thus, it seems likely that incorporation of a fourth 
nucleotide into the transcript would result in a steric clash and 
a destabilization of the initiation complex structure. Indeed, 
our attempts to elongate the 3 nucleotide transcript by one 
nucleotide destroyed transcribing crystals of the initiation 
complex. However, DNA protection experiments suggest that 
T7RNAP is still bound to the promoter in the presence of a 
six nucleotide transcript (4), which is inconsistent with a 
complete conversion of T7 RNAP to the elongation complex 
structure seen here. Taken together, the structural and foot- 
printing data imply either that an additional T7RNAP 
conformation exists that allows the- formation of a longer 
heteroduplex than can be accomm<xiated by the initiation 
complex while still making promoter specific interactions, or, 
alternatively, that the RNA peels off the template after 3 
nucleotides, as suggested earlier (12), until it becomes long 
enough to form the seven base pan: heteroduplex seen in the 
elongation complex. It is not obvious what kind of 
intermediate conformational change would move helices F 
and G from blocking an elongating heteroduplex while not 
destroying the promoter binding site, since helices F and G 
move in concert with helices D-G in the observed transition 
(Fig. 2). In any case, the longer heteroduplex should 
destabilize the initiation complex conformation of T7RNAP 
and make interactions that stabilize the elongation 
conformation of the enzyme, Indeed, the total surface area of 
contact between T7 RNAP and the DNA/RNA substrates is 
the same in the initiation and elongation structures, but only 
after the heteroduplex reaches 7 b.p. in length. 

One might ask why the abortive synthesis of short 
oligonucleotides exists and why th e enzyme might not be 
"designed" to carry out the stable RNA synthesis that occurs 
in the elongation phase right from the start. The initiation of 
RNA synthesis at a particular site that is required for specific 
gene expression and regulation as well as the need for de 
novo, unprimed synthesis necessitates binding of the 
polymerase at a specific DNA location, the promoter. 
Furthermore, the binding of T7RNAP to both the promoter 
and downstream DNA appears to be essential for opening the 
bubble. Since short transcripts (2-4 nt) cannot form stable 
heteroduplexes, polymerase leaving the promoter prematurely 
would lead, presumably, to bubble closure and transcript 
displacement by the non-template strand An enzyme locked 
in the elongation mode conformation seems unlikely to be 
capable of specific initiation and bubble opening. 

17 RNAP opening of the transcription bubble. 
T7RNAP appears to facilitate formation of a transcription 
bubble by untwisting and bending duplex DNA. To derive the 
degree of promoter untwisting and bending upon binding to 
T7RNAP, a complete open promoter DNA was constructed 
by superimposing the palm domains of the open promoter and 



the elongation complexes and combining the upstream DNA 
from the former with the downstream DNA of the latter. This 
complete open promoter contained 13 b.p. of upstream 
promoter duplex, 10 b.p. of downstream duplex and six 
nucleotides of template between -4 and +2, for a total of 29 
nucleotides of template DNA. After superimposing the 13 
b.p. promoter on one end of a 29 b.p. B-form duplex, the 
other end of the straight DNA has to be bent by 80 degrees 
and untwisted by 146° to superimpose on the downstream 
DNA of the complete open promoter. The bubble is 6 
nucleotides long and includes nucleotides A to +2 (promoter 
numbering). The template strand of the bubble, which is 
visible in the complexes, bends sharply (-90°) at position -4 
and descends into the active site and likewise bends about 80° 
after +2 to re-emerge from the deeply buried active site and 
rejoin the downstream duplex at +3. 

We suggest that energy required to melt the 6 base pairs of 
duplex to form the bubble (about 9- 16 kcal/mole) may arise 
from the reduction of the DNA twist by about 146° and 
changing the relative orientations of the upstream and 
downstream DNA axes by 80°. The under winding of DNA 
produced by promoter binding could destabilize duplex by up 
to 24 kcal/mol (28) while the bending may destabilize it by as 
much as 25 kcal/mol (28), either one of which is sufficient to 
melt the duplex. It is presumably the extensive interaction 
between the enzyme and a bent, unwound, open promoter 
DNA that produces the free energy required to distort the 
DNA thereby destabilizing and opening the duplex. The total 
surface area of the initial open promoter DNA that interacts 
with T7RNAP is 2700 A 2 . Using the conversion factor of 25 
cal/A 2 of buried surface area, which is often used to evaluate 
the energetic contributions of hydrophobic interactions to 
binding (29), we can calculate that as much as 68 Kcal of 
intrinsic interaction energy may be available for DNA 
distortion, entropy of immobilization reducing 
conformational entropy and a 10" 9 M dissociation constant 
(30). In this regard it may be interesting to note that the 
several insertions in the fingers domain of T7RNAP as 
compared with Klenow fragment serve to greatly increase the 
interaction surface with downstream DNA in the RNA 
polymerases. 

Non-template DNA. The single-stranded non-template 
DNA is well separated from the heteroduplex in the 
transcription bubble (Figs. 1B,3C), which is held open by 
extensive interactions between the polymerase and both the 
template DNA-RNA heteroduplex and the single stranded 
non-template strand. The non-template DNA is immobilized 
at two points along the transcription bubble. At the upstream 
fork, it interacts with the thumb domain and the outer surface 
of the RNA exit tunnel that is formed by sub-domain H. 
Bases at positions n-2 and n-3 are flipped out of the helical 
axis and stack with Argl73 of sub- domain H and Tyr385 of 
the thumb domain, respectively. At the downstream fork, it 
interacts with the fingers domain. The non-template DNA 
interacts with one side of sub-domain H while the RNA 
message interacts with the other (Fig. 3C). 

Crystal structures of the initiation complex and the open 
promoter complex did not show non-template DNA 
downstream of -5 (promoter numbering). In both complexes, 



ScienceapreSS/ www.sciencexpress.org / 19 September 2002 / Page 4/ 10. 1 126/science. 1077464 



the DNA duplex from -J to -4 was melted with the template 
strand firmly bound and plunging into the active site, while 
the non-template strand was disorder ed(J0, 11). If the non- 
template strand is modeled into these initiation complexes at 
the position it occupies in the elongation complex there are 
no plausible interactions apparent, as subdomain H lies on the 
opposite side of the molecule in the initiation complex and 
theposition of the thumb is also altered. This observation 
agrees with biochemical studies showing that the presence of 
non-template DNA in the bubble region stabilizes the 
elongation complex, but has little effect on stabilizing the 
initiation complex (31). 

Strand separation of downstream DNA. Since DN A- 
dependent RNA polymerases transcribe double-stranded 
DNA they must displace the non-template strand from the 
downstream duplex as it enters the bubble to generate the 
single-stranded template, thereby functioning as a helicase in 
addition to a polymerase. Two components of the elongation 
complex structure, sub-domain H and Helix Y (residues 644- 
661) appear to be involved in this process. Helix Y is 
wedged in the fork where the template and non-template 
strands separate from the downstream duplex, whereas sub- 
domain H stabilizes the non-template strand of DNA. A bulky 
amino acid residue, Phe644 at the end of the Y helix, extends 
outward and stacks on the template base at position , the 
first base pair at the downstream end of the transcription 
bubble (Fig. 5). Helix Y serves to divert the direction of the 
non-template strand promoting its separation from the 
template; this is analogous to the role of the thumb helix, in 
diverting the direction of the 5' end of the RNA transcript as 
it separates from template. 

Ever since the first structural studies of the E. colt Klenow 
fragment of Pol 1 (13, 32) and continuing through those of 
substrate complexes with the pol I family of enzymes (14, 14, 
33) it has remained obscure how DNA polymerase I is able 
not'only to fill single-stranded gaps, but also to displace the 
RNA primers of Okazaki fragments and synthesize DNA 
leaving only a nick in the DNA duplex (34). Comparison of 
the structure of 17 RNAP bound to downstream duplex DNA 
with that of the Pol I family of DNA polymerases provides 
structural insists into this process. Superposition of the C a 
backbones of the palm domains of T7 RNAP and E. coll 
Klenow (KF) fragment aligns the homologous portions ot the 
respective fingers domains, and the downstream duplex DNA 
bound to T7 RNAP fits well onto the fingers domain of KF 
(Fig. 5). Helix Y of T7RNAP aligns precisely on helix M of 
KF which lies between the template and non-template 
strands. Furthermore, Phe 644 of T7 RNAP that stacks on the 
last template base of the downstream duplex is identically 
positioned as Phe 771 of KF. Tlie Phe at this position is 
conserved in all Pol I family polymerases as a large 
hydrophobic residue implying that the Pol I family 
polymerases all use a similar mechanism for binding 
downstream duplex and separating the two strands. A similar 
structure is not present in other DNA polymerases, such as 
the B family of replication polymerases that do not exhibit 
strand displacement ability. 

In the model of Pol I construc ted with downstream DNA, 
the non-template strand departs the Pol I downstream duplex 



in the direction of the 5' nuclease domain(35), which is 
responsible for cleaving the Okazaki RNA, while the 
template strand enters the polymerase active site m the same 
way as the template strand in Pol 1 DNA polymerase binary 
complexes (24). In the DNA Pol 1 binary complexes as in 
T7RNAP, the template base that will pair with the incoming 
dNTP is the n position and lies in a pocket until the incoming 
nucleotide arrives to form the ternary complex of enzyme, 
primer-template and NTP. After nucleotide insertion in DNA 
Pol 1 the template nucleotide n now becomes base paired, 
creating continuous duplex DNA with a nick in the non- 
template strand between the n base pair and n+i base pair, 
which is the first in the downstream duplex. 

Inhibition of T7 RNAP transcription by T7 lysozyme. 
Biochemical data implied that T7 lysozyme may inhibit 
transcription by preventing T7 RNAP from undergoing the 
transition from the initiation to the elongation phase (36) 
though previous structural data implied that lysozyme binding 
may additionally alter the site of catalysis by repositioning 
the C-terminus (JO). T7 lysozyme negatively regulates T7 
RNAP by binding to it either during or before the mitiation 
phase of transcription, in which case only short abortive 
transcripts are made, but T7 RNAP is largely unaffected by 
17 lysozyme once it has entered the elongation phase except 
that there is reduced synthesis past pause sites containing an 
RNA helical hairpin (6, 8). The co-crystal structure of T7 
RNAP and lysozyme shows the polymerase in largely its 
initiation phase conformation (except for the extreme C- 
terminus) with lysozyme bound to the C-terminal domain at 
some distance from the promoter and catalytic sites (9, 75). 
The T7 lysozyme structure was modeled onto the elongation 
conformation of T7 RNAP after superposition of the 
lysozyme and elongation complex palm domain structures. 
The lysozyme fits well over most of its interaction surface, 
and the specificity loop moves significantly closer to the 
lysozyme in its elongation complex conformation. 
Comparison of the structure of the lysozyme binding sites on 
T7 RNAP in the initiation and elongation conformations 
reveals no striking differences. Lysozyme bound to the 
elongation conformation, however, would be immediately 
adjacent to the specificity loop and not far from the tunnel 
exit. Perhaps the 5' end of the message and/or the specificity 
loop make a new interaction that prevents elongation of the 
transcript beyond 15 nucleotides as is observed biochemically 
(8). We must conclude that our understanding of the 
structural basis for T7RNAP inhibition by lysozyme is at 
present incomplete. # 

Comparison of T7 RNAP with the multi-subunit RNA 
polymerase. Comparisons of the structures of T7 RNAP and 
its various substrate complexes with those of the multisubunit 
DNA dependent RNA polymerase that have been recently 
determined (37, 38), show several similarities as well as a few 
differences. The T7 RNAP elongation complex has a similar 
angle (-90°) between the axes of the downstream DNA and 
the heteroduplex as that observed in the yeast pol 11 
elongation complex (38) (although the dihedral angles 
relative to the primer terminus differ), and the length of the 
heteroduplex is similar (8 base pairs vs 9 base pairs when 
comparing post-insertion states). Although there are three 
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unpaired template bases between Ihe last nucleotide of the 
heteroduplex and the first observed base pair of downstream 
duplex of Pol 11, they are in the B- form DNA conformation 
and could be base paired in a true transcription bubble in 
which case the terminal base pair of the heteroduplex and the 
first base pair of the downstream duplex would be adjacent as 
in T7 RNAP. The functional reasons for these similarities in 
the structures of substrates bound to two nonhomologous 
RNA polymerase are unclear, but they may be related to 
common mechanisms of translocation, duplex opening, or 
access to and/or correct selection of incoming NTP. T7 RNA 
polymerase has a tunnel-like opening, as do the DNA 
polymerases, most notably the |3 -family polymerases (39) and 
the multisubunit RNA polymerases (referred to as the 
"funnel" in yeast Pol 11) (37), which provides access for the 
incoming NTP or dNTP. The large angle between the 
heteroduplex and downstream DNA allows unhindered access 
of the mcoming nucleotide to the primer terminus. Also 
analogous in the two polymerases, the binding of downstream 
DNA to yeast Pol 11 results in the rotation of a domain, called 
the flap, that sequesters the DNA, a conformational change 
that may be functionally similar to the closing of loops in the 
fingers domain of T7 RNAP around the downstream DNA 
subsequent to its binding. Such sequestering of downstream 
DNA upon its binding to either polymerase family seems 
unlikely to be responsible for processive elongation synthesis 
(37, 38) since such changes presumably also occur upon 
formation of the initiation complex. 

After superimposing the structure of Taq RNA polymerase 
complex with promoter DNA (40) on that of yeast pol 11 
complexed with downstream DNA ( 38), the bend angle 
between the upstream DNA on the former and the 
downstream DNA on the latter can be measured and is again 
similar between the multisubunit polymerase (110°) and T7 
RNAP (100°). The size of the upstream and downstream 
duplex binding sites are about 2 and 1.5 times larger, 
respectively, in the multisubunit polymerase than the 
corresponding interaction sites in T7 RNAP. This difference 
may be related to the larger energy required to open the 12 
nucleotide bubble in the former compared with the 6 
nucleotide bubble in the latter. Further, like T7 RNAP, both 
the bacterial and the yeast RNAPs have a presumed RNA exit 
tunnel that lies near the 5' termini] s of the RNA in the 
heteroduplex which exists also in the apo-enzymes. However, 
in the bacterial holoenzyme, which also contains the o 
subunit that is responsible for promoter specific initiation, the 
tunnel is blocked by a domain of c (37). It has been 
suggested that the transition from the initiation phase to the 
elongation phase in bacteria is triggered by formation of an 
RNA transcript that is sufficiently long to displace c from the 
tunnel thereby facilitating o's dissociation from the complex 
and promoter release (40, 41), a hypothesis yet to be verified. 
Only when a transcript is long enough to displace c and pass 
through the tunnel would processive synthesis commence. 
Presumably, processive RNA synithesis in the elongation 
phase results from the RNA transcript being surrounded by 
protein in both polymerase families. A major difference, then, 
between the multisubunit and T7 RNAPs is the massive 
conformational change exhibited by the latter to form an exit 



tunnel that already pre-exists in the former but is blocked in 
the initiation phase by a. The unprecedented conformational 
dexterity exhibited by T7 RNAP may be a consequence of the 
limited genome space of the 17 phage, which may impose the 
requirement for this dual functionality of promoter 
recognition and tunnel formation by the N-terminal domain. 

Conclusion. The crystal structure of a T7RNAP 
elongation complex shows that the N-terminal domain 
rearranges from its structure in the initiation complex, which 
destroys the promoter binding site and creates both a channel 
that accommodates a 7 base pair heteroduplex as well as a 
tunnel through which the transcript passes after peeling off 
the heteroduplex. These features account for the enzyme's 
processivity in the elongation phase as well as the 
phenomenon of promoter clearance. The fingers domain 
forms a binding site for 10 base pairs of downstream DNA 
whose orientation relative to the upstream promoter seen in 
previous complexes suggests that the enzyme uses the 
interaction with upstream and downstream duplex to bend 
and unwind DNA at the transcription start site and thus 
facilitate promoter opening. The template and non-template 
strands entering the active site from the downstream DNA are 
separated by an a-helix that is also present in the homologous 
DNA polymerase 1 and may explain how DNA polymerase 1 
is able to displace the 5' end of the non-template strand. 
Comparison of the structural differences between the 
transcribing RNA polymerase complexes and the homologous 
DNA polymerase 1 explains how the additional functional 
properties exhibited by the RNA polymerase are acquired. 
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Fig. 1. Substrates in the T7RNAP elongation complex. (A) 
The substrate construct co-crystallized with T7 RNAP 
consisted of 30 nucleotides each of template DNA (blue) and 
non-template DNA (green) that are complementary (except 
for a central 11 nucleotides) and a 17 nucleotide RNA (red) 
whose 3' 10 nucleotides are complementary to the template 
DNA. The nucleotide that templates the nascent NTP is 
numbered 0 and nucleotides upstream are given negative 
numbers and downstream given positive numbers. The 
portions of RNA and DNA that are not visible in the map are 
outlined by dashed lines. (B) A portion of the composite omit 
electron density map corresponding to the substrate is 
contoured at 1.1 O. Figure 1 and Figure 4 were made with 
program SPOCK (42). 

Fig. 2. Comparison of the structures of the T7 RNAP 
initiation and elongation complexes. The initiation complex 
(A) and elongation complex (B) have been orientated 
equivalently by superimposing their palm domains. Helices 
are represented by cylinders and £ i-strands by arrows. The 
corresponding residues in the N-terminal domains of the two 
complexes that undergo major refolding are colored in 
yellow, green and purple, and the C-terminal domain 
(residues 300-883) is colored in gray. The template DNA 
(blue), non-template DNA (green) and RNA (red) are 
represented with ribbon backbones. The proteolysis 
susceptible region (residues 170-180) is a part of sub-domain 
H (green) in the elongation complex and has moved more 
than 70 A from its location in the initiation complex. The 
specificity loop (brown) recognizes the promoter during 
initiation, and contacts the 5' end of RNA during elongation 




while the intercalating hairpin (purple) opens the upstream 
end of the bubble in the initiation phase and is not involved in 
elongation. The large conformational change in the N- 
terminal region of T7RNAP facilitates promoter clearance. 
(C) The DNA and trinucleotide of RNA seen in the structure 
of the initiation complex docked to the downstream DNA of 
the elongation complex shows a 100° bend between upstream 
and downstream segments. (D) The DNA and 7 nucleotides 
of RNA observed in the structure of the elongation complex 
shows a decreased angle of bending between the base paired 
upstream and downstream segments. Figures (E), (F) and (G) 
show the three different conformational changes - helix 
formation, rigid body movement and refolding - undergone 
in the transition between initiation and elongation. This figure 
and Figures 3, 5, 6 were made with the program Ribbons (43). 

Fig. 3. Views of the transcription bubble. (A) Global view of 
the elongation complex with a box outlining the active site 
region which is enlarged in (B), (C), and (D) with the thumb 
in yellow green, subdomain H in green, specificity loop in 
yellow and helix Y in red. (B) Conformational changes of the 
thumb and the specificity loop. The thumb domain as 
observed in the initiation complex (grey) has rotated about 
15° in the elongation complex (green) and assists in the 
separation of the RNA transcript from the template DNA. The 
position of the specificity loop in the initiation complex 
(yellow) blocks the exit of RNA and has moved in the 
elongation complex (brown) to open the exit tunnel and 
interact with the exiting RNA. The three basepairs of 
heteroduplex in the initiation complex (grey) superimposes 
on that of the elongation complex. (C) Interactions of the 
transcription bubble and heteroduplex in the elongation 
complex with domain H (green and red) and specificity loop 
(brown). Proteolytic cuts within the red loop in subdomain H 
reduce elongation synthesis (3, 16). Thumb o-helix (yellow) 
and o- helix Y (orange) are analogously involved in strand 
separation. (D) Shown are side-chains from subdomain H 
(green), the specificity loop (brown) and the thumb that 
interact with the single-stranded 5' end of the RNA transcript 
and facilitate its separation from the template. 

Fig. 4. Formation of the RNA exit tunnel and upstream DNA 
binding site in the elongation complex. The solvent contact 
surface for the initiation complex conformation of T7 RNAP 
(A) with the observed upstream promoter DNA and 
heteroduplex along with the downstream DNA modeled from 
the elongation complex. The thumb domain has been 
removed from both A and B to allow a view into the 
heteroduplex binding site. The elongation complex (B) shows 
the disappearance of the promoter DNA binding site, the 
formation of a new channel that binds to heteroduplex and 
upstream DNA nonspecifically and a tunnel through which 
the transcript (red) is exiting. (C) and (D) are rotated by 180° 
about a vertical axis and the appearance of tunnel that 
contains the 5' end of an RNA transcript (red) in the 
elongation complex (D). The positive electrostatic potential is 
blue and the negative is red. The two complexes have been 
oriented identically by superposition of their palm domains. 
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Fig. 5. The T7 RNAP fingers domain (grey) bound to the 
downstream DNA superimposed on the corresponding part of 
the Klenow fragment fingers domain (yellow). The template 
DNA (blue) is redirected and separated from the non-template 
DNA (green) by an a-helix Y and F644 in T7 RNAP. A 
corresponding a-helix M and Phe are found in the in Klenow 
fragment as are other portions of the downstream DNA 
binding site. Part of the RNA transcript is shown in red. 
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Table 1. Summary of crystallographic analysis 



Data collection and SAD analysis 


Native 1 


Native2 


SeMet(SAD) 


Wavelength ( A) 
Resolution (A) 

Completeness (overall/last shell, Vo) 
Space group 
Unit cell {A) 

No. of sites 

linear 

Phasing power (acentric reflections) 


0.98 
2.1 
oj.U ^jj.OJ 
P2i (2 copies/a.u) 
inn 7 \aa 8 im o 

l\J\). l l'f+.O 

90.0 90.6 90.0 
6.5 


0.98 
2.2 

C222i (1 copy/a.u.) 

14? Q 14S S 14S f* 

ltZi.7 ItJ.U 

90.0 90.0 90.0 
8.2 


0.979 
2.9 
1 Honors 

P2j (2copies/a.u.) 
100 8 144 2 102 0 
90.0 91.1 90.0 
50 

7.2 

1.2 


Sol vent- flipping density modification 
SAD FOM 






0.62 


Structure refinement 
Resolution (A) 
Rfactor/Rfree 

No. protein residues/nucleic acid/ 
water 


40-2.09 (Fo > 2s\F]) 
24.1/27.3 

862 amino acids (missing residues 1, 233-240, 363-374) 
47 nucleotides, 190 water molecules. 


Riinear~\ I-<I>\/I, where / is the observed intensity and <I> is the average intensity for multiple measurements of 
reflections. 

Phasing power= r.m.s.( \Fph\/E), where E is the residual lack of closure. 
FOM, figure of merit. 

Rfree, R factor calculated using die test data set that was excluded from the refinement. 
R~(\(Fp\-\FcMFpl where Fp and Fc are the observed and calculated structure factors. 
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Reference molecule: T7 1 - 883 ( 883 aa) Homology 

Sequence 2: C21 1 - 883 ( 883 aa) 981 

Sequence 3: T3 1 - 884 ( 884 aa) 881 

Sequence 4: Kll 1 - 906 ( 906 aa) 821 

Sequence 5: SP6 1 - 874 ( 874 aa) 481 

Parameters set: Hiffliatcb =3; Open Gap = 5; Extend Gap = 2 

Conservative Amino Acid Changes Permitted 



10 20 30 40 50 60 70 80 90 

********* 
r> MNTI-NIAKHDFSDIELAAIPFNTUDHYGERLAREQLALEHESYEMGEARF^ 
C21 MNTI-NIAKNDFSDIELMIPFNTUDHYGERLAREQULEHESYEMGEARFI^FERQ 

T HNileNIeKNDFSelEUAIPFNTLADHYGsaLAkEQULEHESYElGErRFlKHlERQaKAGEiADNAAAKPLlaTLLPKlttRIveWlEEyasKkGrk 
Kli HNal-NIgrNDFSeIELHIPyNiLseHYGdqaAREQIALEHEaYElGrqRFlKHlERQvKAGEfADNAAAKPLvlTIAPqltl*IdDWkEEqanaRGKk 
sp6 Hq dUAI QLqLEeEmfngGirRFeadqqRQiaAGsesDtAwnrrLlseLiapHaeglqaykEEyegKkGra 

110 120 130 140 150 160 170 
******* 

r PTAFQF LQEI---KPEAVAYITIKTTLACLTSADNTTVQAVASAIGRAIEDEARFGRIRDLEAOFKKIWEEQLNKRVGHVYK 

C21 PTAFQF LQEI---KPEAVAYITIKTTLACLTSvDNTTVQAVASAIGRAIEDEAaFGRIRDLEAKHFlQWEEQLNl^VGHVYK 

T3 p S Ayap LQll — KPEAsAflTlKviLAsLTStnmTTiQAaAgilGkAIEDEARFGRIRDLEAKHFHQiVEEQLNKRhGqVYK 

Kll PrAyypikhgvaselavsmgaevLkEkrgvssEAiAllTIKvvLgnahrplkghnpAVsSql^^^ 

SP6 PrAlaF — i^ C v---enEvaAYITmKvvmdiiLnt--daTlQAiAmsvaerIEDqvRFskleghaAKyFeK-VkksLkasrtksYr 

190 200 210 220 230 240 250 260 

**** **** 

T7 KAFHQWEADH-LSKGLLGGEAWSSWHKEDSIPGVRCIEHL IESTGHVSLHRQNAGWGQDSETI ELAPEYAEAIATRAGA 

C21 KAFHQWEADH-LSKGLLGGEAWSSWHKEDSIHVGVRCIEHL IESTGHVSLHRQNAGWGQDSETI ELAPEYAEAIATRAGA 

T3 KAFHQWEADH-igrGLLGGEAWSSWdKEttmHVGiRlIEHL IESTGlVeLqRhNAGnaGsDhEal qLAqEYvdvlAkRAGA 

Kll KAFHQWEADH-iSKGmlXjGdnWaSWktdeqmHVGtkllElL iEgTGlVemtknkmadgsdDvtsmqmvqLAPafvEllskRAGA 

SP6 hAhnvaVvAeksvaekdadfdrWeaKpKEtqlqiGttllEiLegsvfyngepvfmramrtyGgktiyylqt SEsv gqwlsafkeh 

270 280 290 300 310 320 330 340 350 

******* * * 

T7 LAGISPMFQPCWPPKPWTGITG&YWANGRRPUU^ H>IP 

C21 LAGISPMFQKWPPKPWTGITGGGYWANGRRPLALVsTHSKKALMRYED^ EDIP 

T3 i^GISPHFQPCWPPKPWvalTGGGYWANGRRPIALVRTHSKKgLMRYE^ aDIP 

Kll LAGISPIftQKWPPKPWvetvGGGYWsvGRRPLALVRTHSKKALrRYaDV^ qD»P 

SP6 vAqlSPayaPCViPPrBJrtpfnGGfhtekvasrirlvkgnrehvrkltqkqHPkVYKAINalQNTqWqlNKdVLAVieevir ldlgygvpsfkP 

360 370 380 390 400 410 420 430 

* * ****** 

AIEREELPMKPEDIDMNP EA LTAWKRAAMVYRKDKARKSRRISLEFMLEQANKFANHKAIWFP 

C21 AIEREELPHKPEDIDtNP EA LTAWKRAAAAVYRKDKARKSRRISLEFMLEQANKFANHKAIWFPYNTOWK^ 

T3 slERqELPpKPdDIDtNe aA LkeWKl(AAAgiYRlDKARvSRRISLEFHLEQAMFAskKAIWFPYlIHD^RVY--AVpHFin ) ^ 

Kll AIEREELPprPdDIDtNe vA rl^WrkeMAVYRKDKARqSRRcrcEFHvaQAMFANffiAIWFPYJffll)WRGRV7--AVSHFNPQG 

SP6 ndBnkPanPvpvefqhlrg^^ 
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440 450 460 470 480 490 500 510 520 530 

***** * * * * * 

NDHTKGLLTLAKGKP1-GKEGYYWLKIHGANCAGVDKVPFPERIKFI — EEMHENIHACAKSPLEHTWWAEQDSPFCFLAFCPEYA GVQHBGLSYNC 
cDHTKGpwTlJU(GKPI-GKEGYYWIlIHGAHCAGVDKVPFPERIKFI--EdlffldNIHrCAKSPLENTWWAEQDSPFCFLAFCFEYA GVQHHGLSYHC 
NDHTKGLLTLAKGia 3 I-G€:EGfY»MIHGAHCAGVDKVPFPERIaFI--EkhvddIlACAKdPillNTWWAEQ^ 

NDMTKGsLTIJ^GIG 3 I-G]dGfYWIIIHGANCAGVDKVPFPERIKFI--EF^ GVkBHGLnYHC 
NDlgKaLI^fteGrPvnGvEalkWfcInGAHlwGwDKktFdvRvsnvldEEfqdiDcrdiAadPLtfTqWAkaDaPyeFLAwCFEYAqyldlvdeGradef 

540 550 560 570 580 590 600 610 620 
******** * 
--SLPUFDGSCSGIQHFSAMLRDEVGGRAVNLLPSETVQ^ 
--SIJLAFMSCSGIQHFSAHLIDEVGGRAVNLLPSEW^ 
--SUUFDGSCSGIQHFSAMLRDEVGGRAVNLLPSETVQDIYGIVAqKVNEI^ 
--SLPLAFDGSCSGIQHFSAHLRDsiGGRAVNLLPSdTVQDIYklVAdKVHEvLhqhAv 
rthUMqDGSCSGIQHySmRDWGakAmkPSda^^ 

630 640 650 660 670 680 690 700 

* * * * * * * * 

RSVTKRSVHTLAYGSKEFGFRQQVLEDT IQPAIDSGKG LHFTQPHQAAGYHAKLIWESVSVTWAAVEAHHWLKSAAKLLA 

RSVTKRSVHTLAYGSKEFGFRQQVLEDT IQPAIDSGKG LHFTQPHQAAGYHAKLIWEaVSVTWAAVEAHNWLKSAAKLLA 

RSVTKRSVHTLAYGSKEFGFRQQVLdDT IQPAIDSGKG LHFTQPNQAAGYHAKLIWdaVSVTWAAVEAHHWLKSAAKLLA 

RkVTKRSVHTLAYGSKEslvRQQVLEDT IQPAIDnGeG LHFThPHQAAGYHAKLIWdaVtVTWAAVEAHHWLKSAAKLLA 

RSlTKkpVMTLpYGStrltcResVidyivdleekeaQ^ 

7X0 720 730 740 750 760 770 780 790 800 
********** 
AEVKDKTGEILRKRCAVHWVTPDGFPWQEYKK^ 

AE\T(DKKTGEILRKRCAVHWVTPDGFPWQEYK1CPIQTRLNLHFLGQFRLQPTINTNKDSEIDAHKQESGIAPHFVHS 

AEVKDKKTkEILRhRCAVHWtTPDGFPVWQEYrKPlQkRLdiiFL^ 

AEVKDlOCTkEvLRKRCAiHWVTPDGFPVWQEYrKq^ 

KrnegliiytlttGFileQkiraateBlRvrtcliaGdi]^^ 

810 820 830 840 850 860 870 880 

, * * * *'* * * * 

ALIHDSFGTIPADAANLFiUWRETHVDTY-ESCDVI^ 

ALIHDSFGTIPrDAANLFIUWRETHVDTY-ESCDVLADFYDQFADQI^ 

ALIHDSFGTIPADAgkLFKAVRETMViTY-EnnDVI^FYsQFADQLHEtQLDK^ 

ALIHDSsGTIPADAgNLFlUiVRETHVkTY-EdnDViADFYDQFADQLHESQUJKHPAvPAKGdIiHIJffllLESDFAFA 

AvIHDSFGThadntltLrvAlkgqHVaraYidgnalqklleehevrwivdtgie vPeqGefdLnelndSeyvFA 



